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DYNAMIC MODELING OF PHONETIC STRUCTURE* 
Catherine P. Browman and Louis M. ^Idsteint 



Abstract . A dynamical approach to phonetics allows utterances to be 
represented as compact, linguistically-relevant structures that have 
Inherent tenporal, as well as spatial, properties. This conception 
of phonetic structure is exemplified and tested, in a preliminary 
way. Vertical movement of the lower lip in nonsense items of the 
form [bVbabVb] was recorded. The vowels were either [ij or [a], and 
stress either initial or final. The measured artlculatory trajecto- 
ries were compared to the sinusoidal curves that would be generated 
by an undan^ed mass-spring dynamical system. The parameter values 
of the sinusoids were changed every quarter- or half-cycle, and were 
determined from the measured trajectories' durations and displace- 
ments. The frequency parameter was modulated every half-cycle, 
according to one of two alternate organizational hypotheses, conso- 
nant/vowel vs. transition. Both organizations modeled the trajecto- 
ries closely, witf a slight superiority for the consonant /vowel 
organization. Stress level had a systematic effect on the frequency 
parameter values, with stressed < unstressed < reduced (i.e., 
stressed lowest frequency). Word-initial stressed consonants were 
matched least well by the generated curves, suggesting the need for 
alternative dynamical models. 

Much linguistic phonetic research has atlempted to characterize phonetic 
units In terms of measurable physical parameters or features (Fant, 1973; 
Halle & Stevens,' 1979; Jakobson, Fant, & Halle, 1951; Ladefoged, 1971). Ba- 
sic to these approaches is the view that a phonetic description consists of a 
linear sequence of static physical measures — either artlculatory configura- 
tions or acoustic parameters. Thp course of movenent from one such configura- 
tion to mother has been viewed as secondary. Recently, wj have proposed 
(Browman &' Goldstein, 198^) an alternative approach, one that characterizes 
phonetic structure as patterns of artlculatory movement, or gestures, rather 
than sUtic configurations. While the traditional approaches have viewed the 
continuous movement of vocal-tract articulators over time as "noise" that 
tends to obscure the segment-like structure of speech, we have argued that 
setting out to characterize articulator movement directly leads not to 
"noise," but to organized spatio-ten?)oral structures that can be used as the 



•In V. Fromkln (Ed.), Phonetic linguistics . New York: Academic Press, in 
press. 

tAlso Departments of Linguistics and Psychology, Yale University. 
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basis for phonological generalizations as well as accurate physical descrip- 
tion» In our view, then, a phonetic representation is a characterization of 
how a physical systero (e,g*, a vocal tract) changes over time. In this paper, 
we begin to explore the form that such a characterization could take, by 
attempting to explicitly model some observed articuiatory trajectories. 

Although we want to account for how articulators move over time, this 
does not mean that time per se must appear as a dimension of the description. 
In fact, a dimension of time would be quite problen»at ic, because of temporal 
var iat ions introduced by changes in speaK ing rate and stress. For exaiTflD le , 
suppose our phonetic description were to specify the positions of articulators 
at successive points in time. As speaking i%te changes, the values at 
successive time points are all likely to change in rather complex ways. Such 
a representation would not, therefore, be very satisfactory. It would be 
preferable to describe phonetic structure as a system which produces behavior 
that is organized in time, but which does not require time as a control param^ 
eter (as has been suggested, for exan¥>le, by Fowler, 1977, 1980 ). Like 
conventional phonetic representations, such a system does not explicitly refer 
to time. Unlike these representations, however, it explicitly generates pat- 
terns of articulator movement in time and space. 

The dynamical approach to action currently being developed, e.g., by Kel-- 
zo and Tuller (1984), and SUtzroan and Kelso (1983), provides the kind of 
time-free structure that can charac U-r i/.e articuiatory irovement. The approach 
has been applied to certain aspects of speech production (Fowler, Rubin, Re- 
mez, & Turvey, 1980; Kelso, Tuller, & Harris, 1983; Kelso, Tuller, & Harris, 
in press), as well as to more general aspects of motor coordination in biolog- 
ical systems (e.g., Kelso, Holt, Rubin, & Kugler, 1981; Kugler, Kelso, & Tur- 
vey, 1980). Previous approaches to motor coordination (e.g., Hollerbach, 
198?) have emphasized the importance of a time-varying trajectory **plan" for 
the muscles and joints to follow in the performance of a coordinated activity, 
and require an intelligent executive to ensure that the plan is followed. In 
the dynamical approach taken by these investigators* actions are characterized 
by underlying dynamical systems, which once set into place, can autonomously 
regulate the activities of sets of muscles and joints over time, 

A physical example of a dynamical system is a mass-spring system, t hat 
is, a movable object (mass) connected by a spring to some rigid support. If 
the mass is puUed, and the spring stretched beyond its equilibrium length, 
the mass will begin to oscillate. In the absence of friction, the ^-quation 
characterizing motion is seen in (1), and the trajectory of the ob.iect 'it - 
tached to the spring can be seen in Figure 1. 



mX ^ k(x x<^) ^ 0 

where m - mass of the object 

k ^ stiffness of the spring 
x^ rest length of the spring 

X ^ instantaneous di sp la cement 
% ^ instantaneous ' accelerat ion 



Notice that an invariant organization 
'time-varying trajectory in Figure U No 
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of the object 

of tfir- OlO*^''-t 



(that in (1)) giveu risr^ to the 
point-by-point plan is requir^^d to 
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mx + k(x - x„) - 0 




t 



Figure 1. Output of undamped mv^ss-spr ing system. Time (t) is on the at)oci3sa 
and dispiacemunn (x) is on the ordinate. 
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Figure :\ Lower lip height and^aveform for single token of ; ' rvHb/ibarj ! , 

Acuu:ilic wavefurm wi th oun.sonant v.U^c>ur^r> an*'l roioanes m^'irf^r-'^l . f h ' 
Lower lip height (in ,T.m) rw^r tim-^-, 'rioaur^^G ;ir)d r^-lM.i:5fVj ! .in^ j 
waveform are marked, (c) as in (b), but with tick mark.s lo indi- 
cate displacement and velocity extrem.1. Intervals betwe^'n f^xtroma 
are labeled as discussed in text. 
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jdkMortbti this pattern of movement^ and jtin^ l9 not referred to explicitly. 

tne paraweter ta^Oea jmd the initial eondltiona neetf be ape *if ied/ The 
undattped maa9**apring equation (1) is a very simple exateple of av dynamical sys^ 
tea* It ia ifsportai^t to note that thia ayatem can give rise to a whole faaily 
of trajectoriea^ not just the one pcytrayed in Figure U Different trajecto^ 
ries can be generated by changing values of the system's parameters. For 
exaople, changing the stiffness of the spring will change the observed fre- 
quency of oscillation. Changes in the rest length of the spring and the ini-- 
tial displacefuent of the mass will affect the anfl)litude of the oscillations. 

This simple mass-spring equation (generally with a linear or non-linear 
damping tenu added) exemplifies the dynamical approach to coordination and 
control of movement in biological systems in general, and of speech articula- 
tors in particular. The appeal of this approach lies both m its potentially^ 
simple dascription of articulatory movements (i.e., only a few underlying 
parameters serve to characterize a whole range of movements), and also in its 
physical and biological generality. In order to be useful for phonetics and 
linguistics, however, such a dynamical system must be related to phonetic 
structure* In one early attempt to specify this relationship, Lindblom (1967) 
proposed that a dynamical description could be used to account for speech 
duration data. More recently, Kelso, V.-Bateson, Saltzman, and Kay (in press) 
and Ostry, Keller, and Parush (1993) have analyzed variation in stress and 
speaking rate in terms of a dynamical model* In this paper, we explore a ba^ 
sic linguistic issue that arises In the attempt to couch phonetic representa- 
tions in the language of dynamics, namely, the definitions of the articulatory 
gestures. 

To begin to relate phonetic description to a dynamical system, let us 
consider a very simple example. Figure 2b shows the vertical position of a 
light-emitting diode (LED) on the lower lip of a speaker of American Efiglish, 

as she produces the utterance ['babababj in the frame "Say again," The 

acoustic closures and releases marked on the articulatory trajectory are de- 
termined from the acoustic waveform, shown in Figure 2a. Note that the lower 
lip is raised (toward the upper lip) for the closures and lowered for the vow- 
els. How can this observed lower-lip trajectory be described. In terms of a 
dynamical system? Clearly the lov-'^r lip is showing an oscillatory pattern, 
^ that is, it goes up and down in a fairly regular way, but it does not show the 
absolute regularity of our mass^'Spring system in Figure 1. For example, the 
lip is lower in the full vowels than in the schwa. Thus, a mass-spring organ** 
ization with constant parameter values will not generate this lower lip 
trajectory. However, it might be possible to generate this kind of trajectory 
if the parameter valuw were changed in the course of the utterance. The 
underlying dynamical organization, together with the particular changes of the 
parameters, would then s^ve to characterize the phonetic structure of the 
utterance. 

It iSp of course, obvious that a characterization of lower lip position 
over tjlae is not a conpleUr phonetic representation^ Nonetheless, in very 
simple utterances containing only bilabials and a single vowel, it comes quite 
.01099 to.. bfilnsL^ ao^ pbooetJtc das^trl^tian, .drov^nr~ Coldatein, K«lao^ 
iUibiAf ami Siltgawn 0^4) tiave ahoiai that an aJLtarnating d^tr^M [*MUM'flia^ 
ma«««l sequence can be adequately synthesized using a vocal^'tract simulation 
controlled by only two mass-spring systems — one for lip aperture (the distance 
Ni^iiMfi ^ ttw tw> iii»«>T and OM for lip protrvmion, Cl«arly, Ho^^et^ sore 
utt^aiio«» tilll rw|!itir# addltloM dyiMmioal ^jrstem, and relation* 
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ships among these systems; such Interrelationships and their implications for 
phonology are discussed in Browman and Goldstein (1984). Even for the 
restricted utterances we will be considering here, we simplify the phonetic 
characterization by considering only the vertical position of the lower lip* 
We ignore horizontal lip displacement, the upper lip, and the fact that the 
movements of the lower lip can be decomposed into movements of the jaw and 
movements of the lower lip with respect to the jiw. The general framewor^k we 
are operating wi^-hin {the task dynamics of Saltzman i Kelso, 19H3) allows us 
to describe the coordination of mu It i-art iculator gestures, but this is irrel- 
evant to the present paper, in wh ich we consider only how to describe a 
particular articulator trajectory as the output of a dynamical system. 



The undamped mass-spring systt^m with constant parameter values generates 
sinusoidal trajectories with constant frequency and amplitude-. We will show 
that observed trajectories can be directly modeled as sinusoids whose frequen- 
cy and anq^litude vary at particular points during the utterance. Of particu- 
lar interest I'^ how to define these points at which the values are chantf:ed. 
Since time is not a paranieter of the system, they are defined not with respect 
to some ref€ '^-..e clock, but rather in terms of the inherent cyclic propertie.s 
of the dynan., - . system. 

One set of inherently-def in.ibh' points at wnich pnramf-ter values can be 
modulated are the poinis of minifnum and m^xinurr, nrt iou iator* d isp Ivicenxnj t. 
Modulation at tht^sp points i.s ^-j^^^'sted by stuci^r of nrt icul itor mov^m^ 
that characterize trajectories in t^'rms of opening and closing gt.'Sturcc (e.g., 
Kuehn & Moll, 1976; Parush, Ostry, 4 Munhall, 19B3; ^ussman, MarNeilago, 
Hanson, 1973 )* Alternatively, points of peak velocity (both posit ivr ^irui 
negative) can also serve as dynamically-definable markers for moduhition. In 
a^>'5imple mass-spring system, velocity peaks occur at the resting, or 
.-equilibrium, position. These different points of change imply difft-rt-nt 
phonetic organizations, as can be seen with the help of Figur(> :'<\ Horn 
see tne same art iculatory trajf^otory as in Figure 2b, with the addition of 
tick marks that indicate the displacement and velocity extrem.'i. Ther.o points, 
divide the utterance into intervals, each of which har. been l.ibplrr! ei'h'r 
with a C (for consonant) or a V (for vowel). The consonant lr;t^rvi:3 arc 
those on either side of a disp laoem^.^nt peak, and the vow^-l int^^rvrHr^ ;irr thor.r 
on either side of a displacement vaUey, Points of peak velocity, indicated 
by the smaller tick marks on the slopes, separate consonant int^^rvals from 
vowel intervals. For example, V, is the interval fruiu the minimum piyrAi'mu oi. 
the lower lip in the frame vowel [ey], to the point of peak velo^?ity at 
lip starts tb raise for the initial [b]. C, is the irit(/rval f'ran i.s lalt^T 
peak velocity point to the center of maximum lower lip h^'ignt d'jr m^; ih^' ;D^ 
is the interval from tnis displacement peak to the pe^ik v*'i:>.'.:.y a:; im- 
lowf.-r lip lowers for the following vowel [a]. 

If we change our model param*:^ters only at 1 i splacnm^nit "ikr, ^ifi J v.iU^'yr^ 
then successive VC or CV intervals ( t'. g. , Vj-C^, C^-V,) will be LfMr.i.:t-f i/t-fj 
with the same set of parameters. This constitutors a phonetio hypoth^-oi.; that 
the articulatory trajectories can be modeled as successive CV and VC transit 
tion gestures^ each with their characteristic value^j for th^* dynamical ;>ar:^me« 
ters. The parameters for thest- opening and closing i^^-Diur must. trt^V' Int.^; 
account both the particular consonant and the partioul-ir vowi-:. Thuti, thi:; 
hypothe.^;is provides a phonetic structure rather differ^'nt from that commonly 
assumed in linguistics, in that it does not pro/iOe a phy:3i..;al L'hara'.?'.er i/, a- 
tlon of individual consonants or vowels. 
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_ An altfrnatiye dlvjUiion of tlie ATticulatory tra^torlM is clearly 
"T^albiep ir we chan^ parameters at velocity extrema' rather than dlsplaceaient 
extreaa* In this vay^ successive C Intervals will have a single character iza*** 
tloHj as will successive V intervals (e.g., C^^^^p V;|-V,), These new 
intervals, then, correspond rolighly to consonant and vowel gestures, rather 
than to CV and VC transition gestures. Under this hypothesis, the relation* 
ship between the dynamical characterization and more conventional phonetic 
representations is somewhat more transparent than it is under the transition 
hypothesis. Note, however, that even under this hypothesis, consonants ami 
vowels are defined in terms of dynamical structures, rather than as spattal 
targets « 

In this paper, thenf we present the results of some preliminary modeling 
of articulatory trajectories with sinusoids (the *^utput of an undamped 
mass-spring system), under the C-V and transition hypotheses outlined above. 
In particular, tJw two hypotheses will be oontrasted with respect to how the 
frequency parameter of the sinusoidal model is modulated. The frequency pa^ 
raroeter {proportional to the square root of the stiffness of the underlying 
mass -spring system, assuming a unit mass) is of par*ticular interest, because 
it ,controls the duration of a given gesture, and /thus holds the key to how 
temporal (durational) regularities can be accommod^i^ted In a descriptive system 
that doesn^t Include time as a variable. Therefore, we will examine how the 
frequency of an articulatory gesture varies as a function of stress, position 
within the Item, and vowel quality. ^ 



Method 




Articulatory Trajectories i 

The trisyllabic nonsense Items shown in Table 1 were chosen f</r analysis. 
Stress Is either initial or final, with the second syllable always reduced, 
and the vowels are either [Ij [a]. The Items were recorded by a female 

speaker of American English In the carrier sentence '♦Say again." Table 1 

Indicates the number of tokens of each of the Items that were analyzed. 



Table 1 



Utterance 

blba'bib 

•blbublb 

baba^bab 

^bababab 



No. of Tokens 
11 
]^ 
10 
11 
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|fc>iNNiM)tj| ot ^ J^ik«p*t lip9.«nd ^¥ uer« tracked using a Selspot ^ya« 

fAwi that recorded dlaplaoements, in the mid-sagittal plane, of LEDs placed on 
the no3e^ upper lip, lower lip, and chin. The Selspot ^tput was recorded on 
«n FH Upe recorder and was later digitally aanpled at 200 Hz for computer 
analysis. To correct the articulator displacements for possible movements of 
the head, the Selspot signal for the nose LED was subtracted from each of the 
articulator signals. Each resulting articulator trajectory was then smoothed, 
using a 25 ms triangular window. For the present purpose, only the vertical 
displacement of the lower lip was analyzed. 

Displacement naxlma and minima were determined 'automatically using a 
peak-finding algorithm. Instantaneous velocities were computed by taking the 
difference of successive disfflacement samples. The maxima and minima of the 
resulting velocity curves were determffted using the same program as for the 
di^lacements. Displacement and velocity extrema were used to divide each to- 
ken Into seven C and seven V Intervals, as ^own in Figure 2c. 

Modeling 

Each successive inter\^al of each toKen was modeled as the output of a 
simple mass-spring system by fitting sinusoids to the articulatory trajecto- 
ries. We ^iterated the model trajectories using a sine-wave equation directly 
(equation (2)), In order to emphasize the Inherent cyclic properties of dynam- 
ic systems. Recall that frequency is related to stiffness, and amplitude to 
rest length and maximum displacement. Thus, we controlled frequency, ampli- 
tude, and equilibrium position (rest length). (Phase is discussed below.) The 
individual model points— x • (j)— for an interval were generated according to 
equation (2), for the Jth point in the interval (one point every 5 ms) : 



x'(J) " Xo ♦ A sin (wk ♦ if,) (2) 

where x„ « equilibrium position 
A » amplitude 

w - frequency (in degrees per sample point) 
(j) - phase 



Frequency varied every 2-lnterval gesture, where the gestures were de- 
fined according to the two hypotheses outlined in the previous section. For 
the C-V hypothesis, a gesture included the two intervals between successive 
velocity extrema (e.g., C,-Cj, Vj-V,). For the transition hypothesis, a ges- 
ture Included the two Intervals between successive displacement extrema (e.g., 
V,-C,, C,-V,). We posit that a gesture constitutes a half-cycle. Therefore, 
the frequency was computed as the reciprocal of twice the combined duration of 
the two Intervals oonprlslng a gesture. For exanple, the frequency used to 
«odel intervals C, and C, under the C-V hypothesis was 1 / (2 * (duration of 
C, ♦ duration of Cg)). Similarly, the frequency for intervals and V, .UTider 
the transition hypothesis was computed as 1 / (2 * (duration of C* ♦ duration 
of V,)). 

' • o> ■ ■■ ' • . 

Since our primary Interest In this study was in the frequency parameter, 
we allowed the values of the equilibrium position ami am>litude to c^n^ 

placwvKmt yt We lnt»rirat; a^ju^ffCT't^T ^^ms^r The^ase angle" 6f a sTne traVe 
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i9 90 degrees at waxiiwiw displacement (Uie peaks), and 270 degrees at minlsKin ^ 
'^%tspmwtre ftfw mttjfs). /tfflpntu<te aner equfliftritm position vaiu« w*re 
detensinCPd by the constraint that model and data agree exactly at these 
points, both in phase and in displacement. That is, the observed peaks and 
valleys were assumed to be the displacement extrema generated by the iunder ly- 
ing model. The analogous assuii?5tion was not possible for the velocity extre^ 
ma, however, since often the velocity extrema were not mid-way between the 
displacement extrema (as they would be if the parameter values weren't chang- 
lng~cf. Figure 1). Thus, the observed velocity extrema did not correspond to 
0 and 180 degrees in the modeled trajectories. Rather, the phases for these 
points in the model were permitted to vary according to the constraint that 
model and data agree exactly here as well as at the displacement extrema. 

Results 

Sinusoidal models are strikingly successful in fitt^ing the articulatory 
data. Figure 3a shows the mod«?l trajectory generated for the C-V hypothesis 
superimposed on Ahe real trajectory for our sample token of L'bababab]. The 
curves lie almoR ccxnple^ely on top of one another, dlverglrg substantially 
only during the Cj, Cj, and Cg intervals. This particular token Is the best 
modeled of all L'bab^bab] tokens', as measured by the mean square error o'f the 
modeled points. The token with the worst fit, not only for this utterance but 
for all the utterances, is shown in Figure 3b. Again, the curves lie almost f 
completely on top of one another, diverging substantially only the same 
places as in Figure 3a. 

I 

In general, the modeled trajectories for both hypotheses and for all 
utter %^es fit comparably to the trajectories shown in Figure 3a, Table 2 
gives the mean square error averaged across all tokens for each of the four 
utterances under the C^-V and transition hypotheses. The two hypotheses differ 
by only a small amount, but the C-V hypothesis appears to be consistently bet- 
ter. Comparison of individual tokens supports this slight superiority of di- 
i-vidijig the trajectory into consonant and vowel gestures. 





T-ible ?. 




utterance 


Mean Square 


Error (mm') 




C~V 
Hypothesis 


Transi tion 
Hypothesis 


biba'bib 


.015^4 




•bibobib 


■ .0^66 


.061^j 


baba*bab 


.0358 


.0^*71 




.0907 





4 . 
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a token with best fit for ['bababab] 




mm 



b. token with worst fit [^babdbabl 



Figure 3. Sample camparisons of supor imposed model (C-V hypcthf>s i .s) and ddta 
trajectories, (a) C'babe^bab] token with the best fit. (b) token 
with the worst overall fit, which is also a token of t^bab^^bab]. 
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The contribution of different Intervals of the trajectories to the error 
can be seen In Figure 4. The four curves show the model and data superimposed 
for the best tokeim of each of the utterance types/ under the C-v hypothesis. 
Utterances with [13 are shown on the left (a and b), and utterances with [a] 
are shown on the right (d and e). The graphs at the bottoro of the figure show 
the mean square error for the Individual intervalr frou V, to C^. These are 
averages across all tokens of a given utterance. Again, results for utter- 
ances with [ij are shown at the left, and with [a] at the right. Intervals 
occurring iii stressed and unstressed syllables are' shown separately. 

The error distributions show 'that the worst fit is found for item-initial 
stressed consonants, for both [aj and [i] uttera'nces. In particular, interval 
C2* the release of this initial stressed consonant, is poorly ojodeled relative 
to the other intervals. The release of the stressed consonant is also rela- 
tively poorly modeled in final syllables containing [aJ. Examining the tra- 
jectories in the poorly modeled regions of ['babababj in Figure '<d,^.we can see 
that the actual consonant trajectory (indicated by arrows) shows a flatter top 
than that predicted by sinusoidal trajectories. This can, perhaps, be ex- 
plained by noting that it tends to occur in regions in which the lower is 
raised quite high against the upper lip. The flatte^iing may be the res^R of 
some limit on the compressibility of the lips. Alternati^ly , it may be that 
there is some tendency for initial stressed consonants to be "held," suggesTv 
ing n somewhat different kind of dynamical system ^(e.g., a damnP" 
mass-'spring) , 

The error distributions also show a clear tendency for the reduced syll- 
ables to have the smallest error. This nay partly be due to the fact that the 
actual displacement differences between the beginning and end of such 
intervals tend to be very small, and given that the ends are perfectly mod- 
eled, there Umply isn't much room for error. Similarly, there is some tend- 
ency for ut erances with [i] to show less error than utterances with [a]. 
Again, the lower lip shows less movement with [i] than [a], leaving less roc^ 
for error. However, the smaller amplitude of movement does not completely ac- 
count for the better fit. Correlations between amplitude of movement and er- 
ror are not high, for example, .2^2 for [baba'babj. Thus, the straight-for- 
ward cass-sprlng model we have chosen to investigate appears to be adequate 
for the unstressed and reduced syllables, but needs to be modified for 
stressed, item-initial consonants. 

In addition to goodness-of-f it considerations, a dynajwlcal phonetic 
structure can also be evaluated with respect to how well it can elucidate 
systematic variation. For example, we can examine how the values of the model 
parameters vary as a function of context. Given the preliminary nature of our 
modeling, we will simply show some easily observable trends, rather than pre- 
sent a detailed statistical analysis. 

The bars with solid lines in Figure 5>a show the mean value of the fre- 
quency parameter for the consonant gesture under the C-V hypothesis, as a 
'function of the consonant's stress and position within the Item for the two 
vowel contexts. Only the three consonants preceding vowels are shown. The 
first thing to note about the data la that the nature of the vowel in the item 
(LiJ or»Ca]) has little effect on the consonant frequency (although unstressed 
Cb] has a lower frequency before [a] than before [1]). That Is, the consonant 
frequericy is Independent of vowel quality. Stress, however, clearly shows a 
systematic Influence on the frequency of the [b] gesture. The consonant has a 
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Figure Consonant and vowel frequencies generated by C-V and transition 
hypotheses, according to vowel in item, stress level, and position 
within item. For reduced consonants and vowels, always in medial 
position, .stress or unstress refer to the preceding syllable. 
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higher frequency in unstressed syllables than in stressed syllables, for both 
4 the Initial and final syllables in the item. In the medial reduced syllable, 
the consonant has the highest frequency of all. Kelso et al. <198^) also 
found unstressed gestures to be stiff er than stressed gestures, which is 
equivalent to an Increase in frequency. This pattern of variation is com- 
pletely consistent with the lengthening effect of stress as measured acousti- 
cally, (e.g., Klatt, 1976; Oiler, 1975). Additionally, there is variation 
according to position. Item-initial stressed consonants are lower in frequen- 
cy than consonants in the final syllable of the item. Again, this is consist- 
ent with observed acoustic word-initial consonant lengthening (tJller, 1973). 

The vowel gestures are analyzed in a similar way in Figure 5b. Reduced 
vowels have higher frequencies than full vowels, as expected from the conso- 
nant data. Full vowels, however, do not behave quite as systematically as the 
consonants. For unstressed full vowels, there is little or no difference be- 
tween [i] and [a] in frequency. Stressed full vowels, however, show a slight 
difference depending on whether the item contains [ i ] or [a]. Stressed [i] 
has a slightly higher frequency than stressed [a], which corresponds to the 
well-known acoustic duration difference noted, e.g., by Umeda (1975). (Re- 
duced vowels show a possible compensatory effect, in that reduced vowels in 
items containing [i] have a lower frequency than those in items with [a].) The 
effect of stress for the full vowels is also not completely regular, but rath- 
er depends upon position. Only vowels in initial syllables show lower fre- 
quencies when stressed. Note, however, that vowels in final syllables arc 
lower in frequency than those in initial syllables, which is in agreement with 
the acoustic effect of final lengthening (Klatt, 1975). It may be, then, that 
the final-lengthening effect washes ait temporal differences between stressed 
and unstressed vowels in the final syllable. (At least one of Oller'a (1973) 
subjects shows this kind of pattern.) Looked at in another way, when the ini- 
tial vowel is stressed, it has about the same frequency as the unstressed fi- 
nal vowel in the same item. That is, the final lengthening effect is similar 
in magnitude to the stress effect. This is consistent with acoustic and 
perceptual investigations of stress patterns (Fry, 1958; Lea, 1977). 

The bars with dotted lines superimposed on the solid-line bars in Figur«^ 
5 show the mean frequencies obtained under the transition hypothesis. For 
reasons to be discussed In the next section, the CV transitional gestures have 
been superimposed on the corresponding C gestures, and the VC transitions on 
the corresponding V gestures. (For example, the consonant in initial posi- 
tion, which represents the the consonant closing and release (C,-Cj) under the 
C-V hypothesis, represents, under the transition hypothesis, the consonant re- 
lease and movement to the following vowel (Cj-V,). Similarly, the initial 
vowel reprei^nts V,-V, under the C-V hypothesis and V,-C, under the transition 
hypothesis.) Comparison of the dotted lines and solid bars shows substantial 
similarity. The only important differences are in the frequencies of the re- 
duced vowels, which in the transition hypothesis are not higher than the full 
vowels. This is perhaps not surprising, given that the VCs that constitute 
the reduced syllables (Vj-C,) include the initial consonant interval (Cj) of 
the following unreduced syllable. 

To sufflaarize^ both the C-V hypothesis and the transition hypothesis fit 
the data quite veil (except for stressed item-initial consonants), and 
generate very similar frequencies. The two hypotheses differ slightly in that 
-^he C-V hypothesis provide* marginally hotter fit, and they predict differing 
patterns of frequencies for reduced vowels. Only stressed and reduced voweia 
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»how a difference In the frequencies generated for items containing Ca] and 
iteas ^contaifling [i]. Stress level » however, has a generally consistent ef- 
fect, with stressed syllables having the lowest frequency, unstressed syll- 
ables somewhat higher, and syllables containing reduced vowels having the 
highest frequency. This stress effect fails only for full vowels in final 
syllables, which in addition display lowered frequencies relative to initial 
syllables. Consonants, in contrast, have lower frequencies in initial syll- 
ables than in final. These stress and position effects are consistent with 
acoustic duration effects noted in the literature. Thus, well-known aspects 
of the temporal organization of speech can be accounted for in a model that 
does not explicitly refer to time. 

^ Implications and Prospects 

The success of a very simple dynamical system in modeling the observed 
trajectories of individual gestures gives important empirical support to the 
dynamical approach to phonetic structure. The approach is theoretically 
appealing because it provides a way of explicitly generating articulator tra- 
jectories from a time-free sequence of parameter specifications for consonants 
and vowels. This is made possible by recognizing, as suggested by Fowler 
(1977, 1980) and Fowler- et al. (198O), that a phonetic structure Is not Just 
a linear sequence of parameter, or feature, values, but also must be described 
as some particular dynamical organization that the parameter values serve to 
modulate. The successive changes in parameter values can be linked to partic- 
ular points in the underlying dynamical organization. This differs from 
conventional phonetic representations that do not provide any explicit way of 
generating articulatory trajectories from a sequence of parameter specifica- 
tions. 

The present model is only a preliminary validation of the general ap- 
proach. A number of improvements need to be made before it can be claimed to 
have predictive power. In particular, the interval-by-interval specification 
of amplitude, with end-points exactly matched, needs to be replaced with a 
procedure that allows aiT»plitude to be specified over longer stretches. The 
determination of frequency should be made in a way that is less vulnerable to 
experimental (and theoretical) ^vror in determining the end-points of the ges- 
tures. Both frequency and amplitude should ultimately be determined by gener- 
al linguistic parameters, for example, stress level and position, rather than 
by item-specific trajectory matching. These Improvements can be carried out 
using the present simple undamped mass-spring dynamical modol, in addition, 
alternative dynamical models need to be explored, in ordor to account for 
poorly-matched Item-Initial stressed consonants, as woil -^r, intor-art iouiator 
compensation effects (cf. Saltzman & Kelso, 1983). 

Another area to be investigated further is the organization of the under- 
lying phonetic structure. This paper compared two organizational hypotheses, 
consonant-vowel gestures, and transitional gestures. While both hypotheses 
fit the data quite well, in this preliminary test, thero is somp indication 
that additional organizational hypotheses should be explored in future model- 
ing attempts. 

In the comparison of the two hypotheses, the CV transition w;i.'i eq»j;itf>d 
with the C, and the VC transition with t^e V. This was a poot-hoc decision, 
based on the similarity of frequencies when the two hypotheses were so equat- 
ed. In fact, the frequencies would not appear similar at all if the CV 
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tranaitions were equated with V3, rather than with Cs, and the VC transitions 
were likewise switched. why the frequencies should line up this way Is not 
clear. It may simply be the case that the intervals immediately following the 
displacement extrema (which are the intervals coraraon to the C (or V) gestures 
and their equated transition gestures) are those in which frequency is cru- 
cially controlled. This interpretation is supported by^ results from an addi- 
tional analysis, in which frequency was determined interval by interval, rath- 
er than by using two contiguous Intervals. In this analysis, exactly those 
Intervals following the displacement extrema displayed the stress and position 
patterns discussed in the preceding section, while the alternate Intervals 
showed no clear relationship to the linguistic variables. However, there is 
also ^ more interesting account. This Involves positing a structure in which 
frec.uency Is fixed over a larger span of at least three intervals, e.g., 
C,~Cg-V, and V,-V,-C,. These longer gestures constitute a kind of overlapping 
organization (V, appears in both above), which is independently motivated by 
the kinds of coartlculatory phenomena typically observed in speech (cf. the 
overlapping segment analysis of coartlculatlon presented by Fowler, 1983). 

Some such concept of overlapping gestures is also suggested by another 
regularity observable in the frequency patterns. The frequency of a consonant 
gesture under the C-V hypothesis Is lower than the frequency of the vowel that 
follows it. This is counter to the common assumption that consonants Involve 
short, rapid movements, while full vowels correspond to longer movements. The 
common assumption might, of course, be wrong. But such a counter-intuitive 
result may also be indicative of methodological problems, such as the choice 
of end-points, or of a basic fl?w in the hypothesis generating the result. 
One obvious candidate for such a flaw is the assumption, in both hypotheses 
Investigated, of independent, sequential gestures. Such an assumption was 
useful as a starting point, but is unlikely to be accurate. Rather, some form 
of overlap of the gestures— coartlculatlon— would likely giv^ a better pic- 
ture, and will be permitted in future modeling attempts. A possible overlap- 
ping structure Is one in which consonantal gestures are phased relative to 
ongoing vowel gestures (cf. Tuller, Kejlgo, & Harris, 1982)» 

Finally, the comparison of the C-V hypothesis with the transitional hy- 
pothesis carries certain implications, not only for future research into 
phonetic organization, but also for the interpretation of past studies. 
Investigations into the nature of speech articulator movements have tacitly 
assumed the transition hypothesis (e.g., Kuehn & Moll, 1976; Parush et al . , 
1983; Sussroan et al., 1973)» and have consequently couched the description of 
their results in terms of opening and closing gestures. The present study, 
however, shows that the C-V hypothesis provides an organization that captures 
all of the same generalizations in the data as the transitional hypothesis; 
one that fits the data as well as or better than the transitional hypothesis; 
and moreover, one that is more immediately relatable to traditional linguistic 
units. In addition, while the two hypotheses generally produce equivalent 
frequency analyses, in at least one case— that of reduced vowels— they appear 
to differ substantively. The present study does not constitute evidence for 
one hypo^^sis over the other, given the overall similarity in fit. However, 
It does constitute evidence that the C-V organization, or some variant there- 
of, warrants serious consideration in the Interpretation of speech articulator 
movement data. In general, we think that bringing dynamical principles to 
bear «n problems of linguistic organization will lead to more linguistical- 
ly-relevant accounts of speech production, as well as to a rnvtoh richer, yet 
dimple, conception of phonetic structure. The structure comprises an under ly- 
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ing dynamical ayatem with asaoclated parameter values. Together, the system 
and Its parameters explicitly generate patterns of articulator movement. In 
addition, as we have demonstrated, such structures can retain the useful de- 
scriptive properties of more conventional phonetic representations. 
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COARTICULATION AS A COMPONENT IN ARTICULATORY DESCRIPTION* 



Katherine S. Harrist 



Coarticulation in Conventional Descriptions 

In the recent past, the speech pathologist was often given a course in 
"articulatory phonetics." This study had as its goa . teaching the student to 
make a series of alphabetlike symbols op a piece of paper, which, if the 
training was successful, would enable the student to perform such trick 3 as to 
read aloud in the "dialect" of the original speaker. Indeed, in the academic 
setting where this form was most highly developed— London University, the home 
base of Henry Sweet — these methods were used to change speech patterns not on- 
ly of countless cockneys but also of the many non-native speakers of English 
who swarmed to London in the great days of the British empire. Of course. 
Sweet was the historical model for the hero of the play Pygnialion and the mu- 
sical Ml Fair Lady (Borden & Harris, 198O). 

In such training schemes, it was routinely assumed that there was no 
great difficulty about producing an adequate representation of the detailed 
act of speaking from alphabetlike marks on the page in which the only 
representation of time was the indication of visual succession (Lisker, 197i<). 
Even now, it may be debated whether our knowledge of the principles of 
alphabetic writing is what underlies a belief in the adequacy of the sym- 
bol-by-symbol representation of speech, or whether, alternatively, the princi- 
ples of alphabetic writing depend on some property of the perceptual system 
that makes such a representation seem adequate. Whichever formulation one 
prefers, there is a long history of a relationship between the study of 
phonetics and the desire of various authors, at various times, to commit oral 
narratives to writing. For example, as long ago as the 12th century, an Ice- 
landic scholar wrote the "first grammatical treatise," an attempt to rework 
the orthography of Roman writing to suit the demands of representing the 
sounds of his native tongue (Fischer-JDrgenson, 1975). 

The assumption that a series of symbols is an adequate ^rep^esentation of 
a child's articulation is one of the two basic assgroptions of the typical 
course taken by the speech pathologist. The other is that, by listening, the 
transcriber can infer articulation o'', at least, that aspect of articulation 
that is frequently all that the course provides—a schematic lateral view of 
the steady-state position of the articulators, to be associated with the 
left-to-right alphabetic labels of transcription. 



*Al»o in R. G. Daniloff (Ed,), Articulatory assessment and treatment issues. 
San Diego, CA: College-Hill Press, 1984. ~~ '"*™ 
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Wiil^ the skills' of a well-trained phonetician to reproduce speech are 

^^Smm^^M i«l*»»t - At is not ,olf«r iitot Inforpatl*^^^^^^^^ 

used in perforaing. Even very Well-trained phoneticians may not do a very 
good Job of Judging ttie articulator position associated with a given phone. 
F<»* exMple» Uidefoged (196T) has shown that London-trained phoneticians can- 
not accurately assign tongue positions to the "cardinal vowels" produced by 
their London- trained colleagues. (The cardinal vowels are a reference systen 
of articulator positions that give a kind of grid for vuwels.) Indeed, it is 
his contention that vowels are sorted into categories on the basis of acoustic 
rather than articulatory similarity. In part, the phonetician's difficulty in 
making articulator position inferences is the inevitable result of the asynoe- 
try of the relationship between acoustics and articulator position. Theoreti- 
cally, although the acoustic signal can be estimated frotn a sufficiently de- 
tailed krK>wledge of vocal tract , shape, a given acoustic signal may be 
associated with any of an infinite rainier of vocal tract shapes. An aimising 
example is provided by i^defoged (Ladefoged, Harshman, Ck>ldstein, & Rice, 
1978). He shows two lateral views of the vocal tract. In one view, the vocal 
tract has a physiologically sensible contour. In the other, the .tongue ap- 
pears to have been creased Into pleats. The two shapes are acoustically 
indistinguishable. 



Figui*e 1. Two vocal tract shapes which generate the same foraant values. 
Reproduced from Ladefoged, P., et al., 1978, op, cit.. 
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In many of the more recent versions of our hypothetical course in articu- 
latory phonetics, it is suggested that students learn to transcribe in 
"feature" notation, although a number of alternative descriptions that fit 
this rubric have been proposed (e.g., Chomsky & Halle, 1968; Ladefoged, 1971; 
Singh, 1978). It is not ray interest here to attack the value of feature 
descriptions in principle. Within speech pathology as a field, they are use- 
ful in describing such diverse phenomena as the confusion matrices generated 
by hearing-impaired listeners in speech perception studies , (Bilger & Wang, 
1976) and the transfer of training in articulation correction (Compton, 1976; 
Pollack & Rees, 1972). The classic feature description is temporally 
isomorphic with a phonological description. The feature description, in its 
most sophisticated form (Chomsky & Halle, 196W, was developed to capture cer- 
tain kinds of generalization within linguistics, such as roorphophonemic alter- 
nation rules; the fact that the features have a physiological referent is 
not, in principle, an issue within the generative phonology framework. From 
the point of view of temporal structure, the features are abstract and time- 
less in the same sense as the units they were designed to replace. 

The picture of speech production that our hypothetical student might 
infer, then, would be that the act of speaking proceeds from steady state to 
steady state, with (since the articulators must move continuously) some 
uninteresting events between, and that the articulatory origins of the steady 
state events are fairly transparent. 

For many members of the research community, the sheer conspicuousness of 
the dynamic, as contrasted to the static, characteristics of the speech signal 
was first revealed by the illustrations in the book Visible Speech (Potter, 
Kopp, & Green, 1947), when it appeared shortly after the Second World War. 
The book represerted, in many ways, the culmination of efforts by the Bell 
Telephone Laboratories to execute a mission inherited from Alexander Graham 
Bell himself. Bell had an interest both in the visual representation of 
speech and in using this representation to aid the deaf in learning to talk 
(Borden & Harris, 1980; Bruce, 1973). The attitude taken by Potter, Kopp, 
and Green towards the temporal structure is an interesting one, given their 
pedagc^ical purpose; one must learn to recognize the "characteristic posi- 
tion" or "hub," and the coarticulatory influences on it. While mention was 
(necessarily) made of the time-varying nature of the pattern, they said almost 
nothing about the time course of events as characteristics of speech sound 
representation. In other words, they took a segmental approach, although the 
dynamic aspv^cts of the pattern were quite conspicuous. 

It is a mistake to suppose that phoneticians whose main work preceded the 
sound spectrograph were wholly unaware of temporal phenomena, although these 
phenomena fit uneasily into any transcriptional description. For example, 
diphthongs are conventionally transcribed with two symbols, although their dy- 
namic character was recognized. Jones, Sweet's successor, said: "For the 
purpose of practical language teaching it is convenient to regard a diphthong 
as a succession of two vowels, in spite of the fact that, strictly speaking, 
it is 'a gliding sound" (Jones, 19^?6, p. 99). 

Earlier phoneticians were also well aware of thf canisequencec to articu- 
lator movement; articulator position for one sound^might influence that for a 
tempbrally adjacent one. This is the phenomenon called assimilation by Jones. 
For a common example, in the pronunciation of these Shoes in ordinary speech, 
the /?,/ /// sequence Is reduced to a single tongue movement to provide a suit- 
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able position for ///• However, phoneticians were unaware of the extent to 
which the phenomena described above were not special examples; that is, since 
articulator position changes continuously, context sensitivity Is the rule, 
rather than a phenanenon to be explained In special tases. Much of the effort 
for the following decade in the study of both speech production and speech 
perception was to build theories to account for the mismatch, in perception 
and production, between transcriptional phonetics and the phenomena of speech 
production. Theories in this field may be divided into two broad classes'---we 
might call them di screte and continuous . 

Theories of Speech Prodyc^t^^n 

In this section, we will discuss some fairly Recent speech production and 
perception theories. To a certain degree, thes^^JMorles wer alMd at 
rationalizing transcriptional or perceptual simplicity in the face acoustic 
or articulatory variability. As noted above, the theories are of two basic 
kinds: discrete and continuous. 

A^^ an example of a discrete model, one might choose Perkell's model of 
the speech production process (Perkell, 1980), which is, in turn, based on 
Stevens' quantal model (1973). Without going into the detaUs of the mdel 
shown in Figure 2, it can be seen to have stages such that t^ie input, at the 
top of the figure, is a series of segments {S^, S^, S,, and S^) with each seg- 
ment specified by a feature matrix, which is transformed into an isomorphic 
sensory goal. In the output, _ due to various hypothesized mechanisms, the 
boundaries between segmerjts are no longer perpendicular lines ^ so that the 
'^motor goals** and the segments are no longer isomorphic. This model is very 
like the one proposed by Henke (1966) to explain coartlculation, which will be 
discussed below. Two points should be noted: the only representation of time 
in the Input is a simple succession, as In transcription, and the effect of 
reorganization is to desynchronize the representations of the transcriptional 
units. 

An alternative point of view, although in very primitive form, is 
represented by Liberman's motor theory (Liberroan, Cooper, Shankweller^ & Stud^ 
dert-Rennedy, 1967). The motor theory Is designed to account for the finding 
that two acoustic synthetic speech patterns will both produce the pei^ceptual 
Impression of the san^ consonant /d/, coupled with two different vowels (see 
Figure 3). Apparently the percept depends In some rather direct way on the 
dynamics of the acoustic pattern. The motor theory aSsumed that the listener 
must perform some operation dependent on the articulatory dynamics of produc- 
tion. This Is a continuous theory because the dynamics of the pattern-^re im-- 
portant in themselves. It should be pointed out, however, that while the moo- 
ter theory can be described as a continuous theory, Liberman has produced a 
stage model of the production-perception process that is quite similar to 
Perkell's (Liberman, 1970), and that Perkell, while in this classification a 
discrete theorist, has produced an extremely elegant discussion of artlculato-- 
ry dynamics from a quantal theory perspective (Perkell & Nelson, 1982). 

It should be noted here, as well, that there is an apparent dichotomy be-- 
tween theories with some kind of linguistic referent, as discrete, and theo- 
ries with some kind of motor referent, as continuous. This dichotoniy is cer- 
tainly not, a necessary' one. Thus, Fowler has 2frgued (1977) that although 
symbols for speech may be represented in a particular form on a page, thii3 
does not mean that, their motor representations take the same form In the ner- 
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Figure 2. A. figure showing Perkell's model of the translation of a feature 
matrix representation into articulatory units. Reproduced from 
Perkell, J., 1980, op. cit. 
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Two patterns perceived as /d/, followed by jjifferent vowels. 
Reproduced froro iHbermanr A-., 1970, op. clt. 
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¥ou9 syaten. It is possible to argue that a representation of a motor plan in 
which the speech act is conceived as present somwhere in the nervous -dystem^ 
stripped of Its tesq^oral propw'ties, which are then added in the execution, is 
very like the conception of speech as a phonological string. 

It is important when examining theories of coarticulatlon in detail, as 
we shall do below, to recognize that the study of coarticulation is merely a 
asBall part of the study of skilled isovement* Speech is special, as a type of 
skilled movement, in some rather unfortunate ways. For one thing, as we dis- 
cussed above, speech comes with annotation scheme developed for special pur**- 
poses, which may lead us astray when we attempt a more physiological descrip- 
tion (Holl, Zimmermann, & Smith, 1977). Speech comes, as well, with a very 
inaccessible set of independent variables, as most articulators are difficult 
or impossible to observe without special techniques. However, even If experi- 
mental data on the movement of the articulators were easily gathered, one 
could not develop a theory of coarticulation simply by turning to a formula-^ 
tion lying ready-made in, for exaflv)le, robotics. Machines can be produced 
that will mimic particular acts, but machines cannot now be designed that will 
adapt to a wide variety of changed environmental conditions, as humans do 
(Kelso, 1981). Furthermore, while we know a great deal about the muscular and 
neurological structures that participate in movement, the increase in our 
knowledge of structure does not help us very much with respect to function. 
For example, although a recent review chapter (Matthews, 1981) testifies to 
the explosion of our knowledge of the microstructure of the muscle spindle, a 
specialized device that provides feedback information about TOvement, the ba- 
sic behavioral questions we ask about movement today are not very different 
from those we asked in the early 1930s, when Bernstein began his studies of 
the coordination of gait (sunmarlzed in Bernstein, 1967) or, perhaps, even 
when Sherrington summarized his observations of the decerebrate cat (Sherring-- 
ton, 1906). We still lack a comprehensive theory that explains why skilled 
movements can be scaled up and down in timing, what causes the resistance of 
movement patterns to disruption by environmental change, and, with reference 
to coarticulation, why the elements of skilled movement patterns can be so 
freely reassembled to form novel sequences. While we have theories of 
coarticulation, as we will see below, they can rather easily be shown to fall. 
In what follows, I will atteD¥)t to outline the proposals for. a model of 
coarticulation and to show how existing data succeed or fail in supporting 
them. 

Hypthoses about Organizational Units and Speech Planning 

Coarticulation as conventionally described is but ^ one of a nuwber of 
phenooiena In dicating SCTn e kind of organizational coheaiveness Tn gpeech , ~J 
great deal of effort has been directed at defining the outer bound over which 
such organizational cohesiven«ss exists. Unfortunately, the larger the unit 
that has been investigated, the larger the unit over which organizational 
dependencies can be demonstrated. For example, Lehiste has shown evidence for 
paragraph coheslvness over units that are larger than sentences. Speakers 
apparently signal first and last sentences in paragraphs by a number of means. 
The initial sentence in a paragraph is often signaled by high fundamental fre- 
quency, the last sentence by low fundamental frequency and laryngealization. 
There are, in addition, durational cues for the termination of paragraphs, al* 
though the way duration is used is language dependent (Lehiste, 1975, 1979, 
1980a, 1980b). It may be that the question of the outer bound of such effects 
is not a meaningful on«. However, even If the absolute outer t>ound of such 
effects Is Indeterminate, we can ask what these effects tell us.. 
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Underlying an interest in many of these phenaaena is what Monsell and 
Sternberg (1982) have eall«d the utterance prograa hypothesis . "Certain basic 
assunptlons [about theories of speech production] sees to be widely shared 
aawng psychologists, linguists, and other students of speech. One Such is the 
claim, explicit or lq>licit, that the motor events of an utterance are con- 
trolled by the execution of a plan or progran^-an integrated and relatively 
detailed description of the utterance (or a large part of it) constructed as a 
whole before the utterance begins. We tens this claio the utterance prograo 
hypothesis " (p. 1). The utterance program hypothesis has been used in expia- 
nation of coarticulation itself and in connection with discussions of related 
phenomena, such as declination and slips of the tongue. By considering the 
, latter kinds of phenomena first, we can perhaps clarify our discussion of 
coarticulation theories, which follows. 

First, however, we note that many discussions of speech motor plans are 
.circular. An observation is made that speech is normal in one circumstance 
and abnormal in another. The difference is attributed to the correct or in- 
correct functioning of a motor plan. For example, our typical student of 
speech pathology has heard that the articulation difficulty of some popula- 
tions Is due to the failure of a motor plan. The important thing to note is 
that the invocation of the motor plan adds nothing to the behavioral observa- 
tion that the population does not speak normally (Ke'lso & Saltzraan, 1982). 

A somewhat veiled version of this circular kind of argument is one in 
which the naked motor plan is given some kind of neuroanatomical or 
neurophysiological clothing. For an example outside of speech, the control of 
many kinds of rhythmic activity, such as walking, has been ascribed to the be- 
havior of neural oscillators (Galllstel, 1980), which are not independently 
observed. While one might not wish to return to the kind of anathema on 
physiological theorizing dictated by Skinner (1938), it is important to recog-* 
nize that one of the motivations for his prohibition still holds— there is no 
explanatory power in the restatement of an observation in different language, 
even when the language has an independent prestige. Thus, we find out nothing 
• about an aphasic's speech by saying that it is due to the malfunctioning of a 
particular neural circuit unless we are experimentally prepared to launch a 
search for the circuit or unless what we know from other sources about a neur- 
al circuit of the proposed type allows us to make Inferential predictions that 
we can test about the re&ultant behavior of aphasics. 

A related problem with the i^taphor of the motor plan has been raised by 
Turvey and his associates---it is that the existence of behavioral system 
activity does not require a single controlling mechanism that lies at a 
particiilar level in the nervous system and specifies in detail the properties 
to be controlled (Turvey, 1977), and, indeed, there are logical problems with 
the whole idea of a single control center. it may be that some of the 
characteristics of motor control, which have been attri^ted to the operation 
of a plan, f^re properties of the' motor systeritself, which emerge as It be- 
haves. Thus, the fact that bees build hexagonal honeycombs does not mean that 
the bee has a hexagonal floor plan in his central nervous system— rather,, the 
^ honeycomb may arise in its hexagonal form as a consequence of the interaction- 
al properties of the bee and his environment, as the honeycomb is constructed. 

Given that we observe carefully these prohibitions on how much we attrib- 
ute to motor plans, let us return to what we know about the temporal organiza- 
tion of speech. We will talk largely, but not entirely,* about precursory ef- 
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faot5-~ that ia, the effecta that indicata anticipation of speech output befora 
it ooours. Whlla precursory afftcts tall us nothing about their causes, they 
tell us something about relevant tanporal dooains. 

An laportant and ouoh studied phenooenon is the slip of the tongue, in 
which exchanges occur between elements In speech. A faoous example was pro- 
duced by William Spooner, an English clergyman, who once said, "You've hissed 
■y aystery lectures** in place of "You've oissed ay history lectures" (Froakin, 
1971). Slips of the tongue are important to a discussion of coarticulation 
for several reasons. The first, and most Important, is that the primary 
sub lexical exchange units sees to bw elements very close to the single phonem- 
ic segment (Shattuck-Huffnagel, 1983). Apparently, these phonemic segments 
are correctly produced for their new positions. The existence of such shifts 
is probably the best evidence we have of the existence of a prenotoric termi- 
nal stage in the speech production process (MaoNeilage, Hutchinson, & Lasater, 
1981), Apparently, the units that shift adapt to their liew positions— that 
Is, they are correctly coarticulated with their neighbors. Thus, even though 
we cannot precisely define the phonemic unit in such a way that we can isolate 
it in the speech stream, slips of the tong«*e provide some evidence that a 
phoneme has reality as an action unit. It is interesting to note that al- 
though phones participate as action units, single features do not, evidently, 
appear in exchange error units (Shattuck-Huffnagel & Klatt, 1979). 

A final point may be made about slips of the tongue. The sphere over 
which they occur appears to be of the general length of a breath group. This 
is roughly the temporal domain of declination and, perhaps, of durational 
interaction, but it is substantially longer than the tei^oral extent over 
which conventional coarticulation spreads. 

Another recently fashionable bit of evidence for ispeech motor planning Is 
the so-called declination phenomenon — the tendency of utterances to decline In 
fundamental frequency from onset to termination. This is at the utterance 
level, an analog of the phenomenon studied by Lehlste, and cited earlier, that 
the onset cfAhe sentence that comes first In a paragraph is higher In funda- 
mental frequency (FO) than the onset of sentences In later positions* 'Figure 
4 is 9 fairly typical exas^le of sentence declination. Historically, this 
tendency has been characterized in two ways; as a terminal fall (Lieberman, 
1967) and as declination (Maeda, 1975) through the utterance. Again speaking 
historically, it has been unclear whether the relevant phenomena should be 
conceived as local ijsed at the end of the sentence, or as distributed through- 
out. Given that Intonation is almost always studied in the context of syntac- 
tically cotBplex and phonetically variable contexts, an experimentally clean 
decision between these alternatives has been difficult, but at least present 
thinking favors the declination description. That is, the downdrift in FO ap- 
pears to run through the sentence, rather than being localized at the end. A 
related question is whether the mechanism Is passive or active. A passive 
mechanism would ^be one in which the generalized downdrift is a simple conse- 
quence of some physiological given. It might, for exan^le, be a consequence 
of an uncorrected tendency for subglottal pressure and, hence, FO to fall 
throughout the course of an utterance. An alternative would be that the shape 
of the fundamental frequency contour, regardless of its proximal physiological 
cause, is a consequence of active planning of the whole utterance. It has 
been suggested that the latter picture of events is correct because of the 
utterance length effect— the tendency of FO to begin at a higher value in 
longer utterances. In a *»speech planning" point of view, a speaker may begin 
the contour at a higher level in order to come out In the same place. 

" 34 



H«rrls: Coaptlculatlon as a Component in Artlculatory Dcacrlptlon 



f 



TIME 

ON TUESDAY JAKE ORDERED A HAMBURGER FOR DINNER 

Figure A figure showing declination in a complex "read^ sentence. Repro- 
duced from Cooper, W. S., and Sorenson, J., op. clt. 
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Figure 5 shows FO contours from a recent experiment by Gelfer, Collier, 
Harris, and Baer (f983)« The contours were produced in reiterant speech—that 
is, the speaker alnloKed himself producing a sore oonplex utterance with the 
syllable aa. The utterances varied in stress placenent and in length. Such 
structurally sisple utterances produce sia^^le fundamental frequency contours 
made up of one or nore peaks. The initial peak varied in aq>lltude depending 
on the sentence length, but the effect was very sma$l. Froa the point of view 
of speech planning, there were reliable precursive effects, which appear to 
reflect an oltrall rough schema for the utter|nce. However, notice that the 
whole utterance was not reorganized, depending on its length. Whatever utter- 
ance length effects are shown by the declination contour are small and local- 
ized. The domain of the effects, however, is the utterance — a dooain of about 
the sane size as that for slips of the tongue. 

We can say, then, that although speakers may demarcate organizational 
units of greater length, the longest units over which there is evidence of 
planningi In the form of precursive effects, is a unit of the general length 
of a phrase. The examples given here involve slips of the tongue and declina- 
tion. Similar material could be provided for unit duration. We turn now to 
conventional coartlculatlon, which operates over a far smaller temporal do- 
main — on the order of a few speech segments. 

Theories of Coartlculatlon 

'^ Extrlnglc Timing^ Theories 

Since classic theories of coartlculatlon spring from classic representa- 
tions of phonological units, such theories almost by necessity attempt to rep- 
resent coarticulatory phenomena ttwmselves as essentially tioieless. In the 
acoustic real world, no clear boundaries are seen between segments as conven- 
tionally defined. Furthermore, acoustic segments are context sensitive; 
therefore It Is necessary to develop some theory that mediates between the 
.acoustic representation and the (presumed) underlying units* Typical examples 
of such theories are Henke^s look-ahead model of coartlculatlon (Henke, 1966) 
and Danlloff and Hammarberg^s canonical forms model (Daniloff & Hanmarberg, 
1973)» However, other examples of such models can be cited as well; the 
models as a class were discussed In more detail in a very thorough review sev** 
eral years ago (Kent & Minifle, 1977). Here, we will merely discuss a very 
well-known example, the Henke model, and refer readers to the review for more 
detail* 

The Henke model assumes that all phonological units can be represented as 
bundles of features, which occur, in canonical form, as successive unltd along 
a time axis. Each phoneme has a specified value, zero, plus, or minus, for 
each feature. In forming artlculatory sequences, the speaker performs an 
a^tlculatory scanning operation on the phonemes arrayed in a buffer for out- 
put* If a feature is unspecified (that is, has a zero value) for several 
phones preceding the phone for which It Is specified, then the feature will be 
anticipated during the intervening phones; that Is, the Intervening phones 
will assume the same feature value as the upcoming one. 

Thus, in a sequence ' of a spread and a rounded vowel separated by 
nonlablal consonants, the consonants will a»sume the rounding feature. The 
test of this thesis has been to ask speakers tlj'lff'bduce utterances like o nc e 
-true (Danlloff & Moll, 1968) and then to examine the sequence for the t IfwT of 
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the onset of rounding* Using tests of this sort, evidence has be^ produced 
that anticipatory coarticulatlon aay spread over as aany as four or five seg- 
Mnts (Benguerel & Cowan, I97*»j Daniloff * Woll, 1968). The model has also 
been used to explain the early onl^of velar lowering in sequences of rowels 
concluding with nasal consonants. Presumably, in English, vowels are unspeci- 
fied for nasality; hence, when they precede nasals, they becone ^psalized. 

Success or failure ol^he nodel in explaining the data depends on its two 
assu»ptions-%lr3t, that co^rtlculatory spread is tlnoless, aftd, second, that 
whatever feature description is nade of phones is adequate. For example, Kent 
and Hinifle argue that while the Henke model is assumed to explain the find- 
ings of Benguerel and Cowan, it cannot explain the occasldnal spread of round- 
ing to the end of the preceding spread vowel. One way of giving a "quick fix** 
to the theory is by relaxing the feature description requi^ments. Thus, one 
might assume that the end of a spread vowel Is not specified for rounding. A 
second and more plausible, approach is to give up tne assumption of tlmeleas- 
ness in speech production. 

A Temporal Theory 

In the light of a "coproduction" theory by Fowler (1980), discussed be- 
low, Bell-Berti and Harris (1981) proposed a temporal mode of coarticulatlon 
as a substitute for feature-based models. Because Fowler's theory has been 
somewhat enlarged since it was originally presented, we will discuss the 
Bell-Berti and Harris view first. 

In brief, It was Fowler's thesis that current theor^s fail to make an 
appropriate recognition of the temporal dimension in speech production itself. 
Thus, a theory of anticipatory coarticulatlon that falls to acknowledge the 
time course of articulation will fail^ She suggested,, as an alternative to 
the view that static elements of vowel and consonant productions are ex- 
changed, that vowel and consonant are coproduced. 

A simple model of anticipatory coarticulatlon, then, makes t^ree proposi- 
tions (Bell-Berti & Harris, 1981): First, the articulatory period of a seg- 
ment is longer than its acoustic period; second, for a given articulator, the 
period of anticipation is temporally independent of preceding phone string 
number, provided there is no articulatory conflict; and third, that articula- 
tory period may begin at different times for different articulators. 

These propositions were tested using electromyographic techniques for 
anticipatory coarticulatlon of lip rounding (Bell-Berti & Harris, 1979, 1,98:, 
1982). The test is quite simple. If anticipatory coarticulatlon is segment 
based, then its onset will vary with the number of segments; if it is time 
based, then the duration of anticipatory coarticulatlon will be independent of 
the number of segments in a string, provided they do not themselves block 
coarticulatlon. Therefore, in order to provide- a test, speakers were asked to 
produce utterances* of the type [IC^ ui], with a variable number of consonants 
in intervocalic position. Typical results are shown in Figure 6; the onset 
of lip-rounding, that is, the duration of the anticipatory period, is indepen- 
dent of the number of anticipatory segments, or of their durations, except for 
the single voiceless stop condition /itu/. 
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The Beil'&erti '"^d Harris study was repeated, in part, by Sussoan and 
ti«9tlHry (1981). Tney examined anticipatory coarticulation in the strings 
/kikstu/, /kak8tu/» /tiku/, and /taku/. They found different onset times for 
all four utterances by electromyographic measures, although all differences 
were not statistically significant. A repeat of the experiment^ using strain 
gauge measures found no differences between /kikstu/ and /tiku/. They argued 
that the failure to find identical onsets for /kikstu/ and /tiku/ or for 
/kakstu/ and /taku/ argues strongly against time locking of coarticulation to 
the vowel. They also point out that anticipatory coarticulation is earlier 
for strings in which the first vowel is /i/ than those for which it is /a/. 
Their suggestion, in explanation of the latter finding, is that the rounding 
following /i/ begins earlier because of a necessity to counteract the 
blomechanlcal forces that spread the lips for /i/. They supper* neither the 
anticipatory scanning model nor the temporally-locked model, although their 
look-ahead scanner model is segment based. 

\ 
I 

Both their results and ours point strongly to , one experii^ntal issue, 
noted above; that is, that artlculatory constraints are unpredictable from 
the feature specification of phones. The two vowels /i/ and /a/ are, in 
feature specifications for rounding, respectively minus and zero. Yet -round- 
ing onset time Is affected in a manner that is contrary to the feature predic- 
tion. 

The deviance of the data point for the /itu/ sequence Is less conspicuous 
in the Harris and Bell-Bertl data than the /iku/ sequence In the Sussinan and 
Westbury paper, since yie latter authors are plotting a two point continuum. 
On the assumption that the sequences are equivalent In the two experiments, a 
possible explanation of the deviance of the onset for rounding for single 
intervocalic stop sequ^ces Is provided by Engstrand (1983). He pointed out 
that relaxation of rounding has been shown to occur in the sequence /utu/ 
(Gay, 1978; Harris 4 Bell-Berti, 1983). He suggests that /t/-burst release 
may be Incompatible with a fully rounded lip position. If l.his Is so, then 
the lips roust move rapidly from a fully rounded position, for /u/, to a partly 
rounded position for the preceding string. In sequences of the form /itu/, 
full rounding must be suppressed rapidly. For all other sequences (/is tu/; 
/ist stu/, etc.), while full rounding must end rclatlvly close to the conso- 
nant release, partial royndlng can end anywhere In the preceding string. Tht 
general principle expressed is that production of dentals is incompatible with 
full rounding and compatible with partial or no rounding. We would, then, ex- 
pect both the onset and the time course of rounding to be important in a full 
theory of coarticulation. 

Two final experiments on anticipatory lip rounding may be clted—by Lubk- 
er (1981) and by Lubker and Cay (1982). The first, by Lubker, gives unequivo- 
cal support to the view that the onset of lip rounding varies with the length 
or duration of the preceding consonant string. The second shows individual 
differences In the form of the function relating the electromyographic ons^t 
of rounding to number of consonants In an intervocalic string. However, tlfis 
study did not examine consonant string duration but merely consonant number. 

In all of the above, we have concentrated on the anticipatory coarticula- 
tion of Up rounding. It should be pointed out that there is a similarly de- 
tailed, and almost as confused, literature on anticipatory nasalization. In- 
deed, Al-Baserni and Bladon (1982) suppose that there may be two forms of 
anticipatory nasallssatlon-- one time looked and one variable. However, this 
seeflis a heuristlcally unsatisfactory solution. 



31 



Harris: Coarticulation as a Component in Artlculatory Description 



In reading through this account of a series of experlaents with their 
disparate results, the reader should be forgiven for sooe feeling of Dewllder- 
aent. It nay be vrorthwhlle to consider what we do Know. First, it Is clear 
that conventional feature descriptions of phones are not strong enough to pre- 
dict the details of their articulation, ^ther spatially, that Is, in terms of 
their detailed artlculatory topology, or temporally, in terms of when one 
artlculatory gesture begins with respect to another. At present, we are not 
sure why there, are experimental differences among, investigators. The only 
present solution seems to be a more yiorough investigation, using simultaneous 
electromyographic, acoustic, and movement techniques. 

Coarticulation and compensatory shortening . Fowler's comments on extrin- 
*sic timing theories of speech production have been cited above. However, the 
theory is far richer and more complex than we have indicated. It was devel- 
oped, in part, as a means of explaining perceptual Isochrony, the phenomenon 
. that syllables perceived as being of more or less equal duration are systemat- 
ically unequal. Some of its principles form a general theory of production. 

Fowler assumes, following Ohman (1966) and Perkell (1969),, that vowels 
and consonants arc coproduced so that neighboring segments overlap; i.e., a 
consonant is produced while a vpwel is being produced. The speaker can use 
such a strategy because vowels and consonants are different kinds of units. 
Succeeding vowels are produced as slow changes in the position of the tongue 
body In the mouth. Consonant production is more localized, may Involve a par- 
tially non-overlapping set of muscles, and is superimposed on the continuous 
vowel-to-vowel movement ->f the tongue. Unstressed vowels are presumed not to 
Interrupt the trajectory from one stressed vowel to the next. This model has 
both spectral and temporal consequences. Let us first consider the temporal 
consequences . 

It has been shown, very often, that the measured duration of a vowel 
shortens as increasing numbers of consonants are added to it (for a review see 
Lehlste, 1970). There are backward shortening effects j;eported as well; that 
is, a vowel shortens as increasing . numbers of consorl8§ts precede it (e.g., 
Llndblom & Rapp, 1973). However, backward shortening (that is, effects of 
consonants on succeeding vowels) is much the smaller effect. The effects of 
unstressed vowels on stressed vowels are analogous to the effects of conso- 
nants on vowels — for example, the stressed vowel in easy is shorter than in 
easily . In Fowler's model, the reason for the shortening is the artlculatory 
overlap produced by coproduction. 

The same mechanism produces spectral coarticulation. If an unstressed 
vowel is preceded or followed by a stressed vowel, it should coarticulate with 
It. Indeed, coartlculatory and shortening effects are but two measures of the 
same thing and should be highly correlated (Fowler, 1981). Fowler's test of 
the prediction shows usually significant correlation but some failures in the 
detailed prediction, apparently due to peculiarities of the particular experi- 
mental paradigm. 

This theory does not make any predictions about lip rounding, because it 
is concentrated on the vocal tract manifestations of coproduction, which was 
the example used by Ohman. It is hard to believe, however, that some parts of 
the system operate on different principles than others. Furthermore, the modr 
el does not cover the well-known shortening effects of consonants on other 
consonants (Hawkfns, 1973). Perhaps its most serious shortcoming, however, is 
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that It does not deal with competing articulation — the clrciuwtance in which 
the articulators are constricted during consonant production so that free 
vowel-to-vowel coarticulation cannot take place. For exaaiple, Recasens ^1983) 
has shown that in Catalan, vowel- to-vowel spectral coarticulation in 
vowel -consonant- vowel (VCV) disyllables . varies systematically with the extent 
to which the intervening consonant engages the blade region of the tongue and, 
consequently, makes coarticulation physically impossible. If Fowler's theory 
were literally correct, one would expect that differences between VCV se- 
quences in the extent to which they can be coproduced would be accompanied by 
corresponding differences In the amount of possible compensatory shortening. 

Coarticulation and Context Sensitivity 

The laboratory Investigations discussed above are, perhaps, of Interest 
to the speech pathologist In terms o/* what they can tell him or her about the 
practical problems of helping a client to Improve a mlsartlculated sound. 
What, if anything, have we learned that is relevant? 

It is the common observation that certain phonetic environments facili- 
tate correct sound production; for example, Curtis and Hardy (1959), in a now 
classic paper, showed that some allophones of /r/ are more often correctly 
produced than others by misartlculatlng children. As Kent said, "An optimis- 
tic Interpretation of this contextual facilitation Is that some phonetic 
environments facilitate correct sound production and this facilitation can be 
exploited to clinical advantage" (Kent, 1982, p. 66). The limits on contextu- 
al generalization as a teaching strategy are entirely outside the province of 
this paper. However, what we can say something about, as a consequence of 
this brief review, la the task facing the child In learning to talk and the 
investigator In attempting to specify the contexts that may be relevant sub- 
jects of Investigation. There are at least two factors that we will peed to 
learn more about; 

^* Rela'i-1 ? production variability . The first section discussed the 
Insecurity that an observer should feel In making Inferences about the articu- 
latory details of production from perceptual Judgment. The observer Is right, 
by definition, in Judging a child's production to be correct. What he or she 
cannot do is to infer the articulation from the acoustics, the effects of 
perceptual factors on his criterion, or the nature of the articulatory error 
when the speaker is judged to be wrong. Even with respect to the varla 'ty 
of the acoustic signal for a given phonemic percept In a given envlronr.u . , t 
is obvious that there Is more acoustic production variability In some < avir-- <- 
ments than In others. Some contextual effects may be contextual effet' on 
listener criterion. For example, the formant values for correct stressed vo^^r 
els are less variable than for unstressed vowels (Sunsners & Soil, 1982). A 
more often studied case Is /s/, a phone that is notoriously difficult for 
children and also notoriously subject to contextual' Inconsistency (Mazza, 
Schuckers, & Danlloff, 1979). It may be that part of the contextLal variabil- 
ity is associated with criterion variability, rather than articulatory varia- 
bility. 

^* Context specifi c ation . A lesson to be learned from the literature on 
coarticulation is that a decision to consider sounds as dividing into 
allophonlc classes leads to balkanization. However, it is questionable wheth- 
er House's (1981) suggestion that Improved transcription may lead to better 
accounts of context sensitivity will help. A sound can be shown to be differ- 
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ent In endless ways, depending on factors both within and without the 
transcriptional record. In truth, we do not know what contexts form natural 
classes. 

# 

It has now been shown repeatedly that children learn phones in words, 
without uniform generalization across all environments (e.g., Macken, 1980). 
These types of context sensitivities must have some significance for practical 
decisions about contexts important in defining a class of phones. On the oth- 
er hand, certain kinds of context sensitivity are apparently not part cf the 
learning process in children nor are they stored as separately learned pat- 
terns in adults. The demonstration that two productions are acoustically or 
gesturally different does not tell us v^ether or not the two members form a 
natural class. It is only careful study of the natural variability of chil- 
dren's articulation, coupled with better assessment of what constitutes motor 
equivalence and cohesiveness in the adult, that will allow us to make progress 
In this difficult field. 
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COHTEXTWAL EFFECTS ON LINGUAL-MANDIBULAR OWRDINATIOH 



Jan EdMirdst 



Abatract. Coordination between intrinsic and jaw^relatod o«^>onents 
of tongue blade ooveoent during the articulation of the alveolar 
consonant /t/ was exaalned across changes in phonetic context. 
Tongue^ jaw interactions included coapensatory responses of one 
articulatory cooq^onent to a contextual effect on the position of the 
^ other articulatory component. A siallar reciprocity has been ob- 
served m studies that introduced artificial perturbation of Jaw 
position and studies of patterns of token-to«token variability. 
Thus, the Ungual-mandibular coaplex see«s to respond in a similar 
manner to at least some natural and artificial perturbations. 

Several recent models of speech production have posited that speech ges- 
tures are accomplished by groupings of articulators that are temporarily mar- 
shalled together to achieve a common goal. Tjtiis sort of functionally-organ- 
ized goal-oriented behavior has been variously described in the literature as 
"motor equivalence" (Abbs, 1979), as "coordlnative structures" (^elso, Tuller, 
J "ff^^' ^983), and as "functional synergies" (Kelso, Tuller, & Fowler,' 
1982). All of these rnodeli* have hypothesized that the Ungual-mandibular com- 
plex operates as one of these functional synergies during the production of 
vowels and of alveolar consonants. 



Earlier studies of lingual and mandibular activity have revealed several 
sources of evidence to support this hypothesis. First, it has been observed 
that Jaw height covarles directly with tongue height across vowel categories, 
although the precise nature of this relationship may vary across subjects and 
across languages (Bell-Berti, Raphael, Plsonl, & Sawusch, 1979; Wood, 1982). 
Second, the tongue has been observed to compensate In an utterance-specific 
way for experimental manipulation of Jaw position. The well-known "bite 
block" experiments provide one example of this type of compensation: The 
first glottal pulse of a vowel produced with an arbitrarily fixed Jaw position 
is reported to have approximately the same formant frequencies as the corre- 
sponding unperturbed vowel (Gay, Lindbldm. & Lubker, 1981? Llndblora, Lubkcr, 
& Gay, 1979; Llndblora & Sundberg, 1971). In addition, a series o. dynamic 
perturbation studies provide evidence that the lips and tongue can compensate 
for dynamic as well as static perturbation of jaw position. Folkins and Abbs 
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(t975) applied a resiatlve load to the Jaw during the closing gesture for a 
MlabUl stop. In all perturbed gestures, bilabial closure was still achieved 
and ooap eii— tot*y responses were observed in both upper and lower lip displace- 
•wfitii. This result has been replicated in a nuflber of experiisents by these 
researchers (Abbs, in press; Abbs & Gracco, 1983) and by others (Kelso, Tull- 
er, V.-Bateson, & Fowler, 198^; V.-Biteson & Kelso, 1984). One ot these lat- 
ter experiflients (Kelso et al., 1984) provided additional evidence t' t inis 
eonpensatory response is utterance^speciflc. In that experiment, electr|0iijyo- 
graphic activity of the orbicularis oris inferior (001) and the geniogiossus 
posterior (GCP) were monitored during repetitions of /b«b/ and /baez/. In- 
creased activity of the 001, but not the GCP, was observed when the Jaw was 
perturbed during the closing gesture for /b/ in /aeb/; by contrast, increased 
activity -Of the GGP, but not thp 001, was observed when the jaw was perturbed 
during the closing gesture for /z/ in /aez/. 



A third source of evJWehce cooes froo observations of unperturbed speech. 
Hughes and Abbs (1976) had exanined lower lip (with the Jaw co«poneht reuoved) 
and Jaw, positions for three vowels across multiple repetitions of each vowel. 
They found that a negative correlation between lower lip and Jaw position 
resulted in a relatively Invariant lower lip resultant position for each 
vowel. In a similar study, Honda, Baer, and Alfonso (1982) observed a 
negative correlation between electromyographic activity of the GGP and Jaw 
height tor multiple repetitions of the vowel /i/ in one subject. Furthermore, 
these authors were able to show that the effect of the observed negative 
correlations was to reduce variability in first and second formant values for 
the vowel. 



Although these three types of o"bservations are insistent with the notion 
of functional cooperation within the Ungual-mandibular coaiplex, it is unclear 
what the precise model of functional cooperation is or how these observations 
are to be related within such a model. The results of the Jaw perturbation 
experiments suggest that the tongue and Jaw can interact In a compensatory 
manner in order to preserve a target articulation. Furthermore, the negative 
correlation between electromyographic activity of the GCP and Jaw height ob- 
served by Honda et al. (1982) across multiple repetitions of the vowel /i/ 
suggest that the tongue and Jaw may also Interact In a compensatory manner 
during unperturbed speech, at least In response to token-to-token variability. 
On the other hand, the fact that Jaw and tongue height positively covary 
across vowel categories may simply mean that both articulators function as In- 
dependent components of the artlculatory feature "vowel height," It is of in- 
terest, therefore, to deteralne whether compensatory interactions of tongue 
and Jaw are observed In response to other Influences during unperturbed 
speech. The coartlculatory context Is, of course, 6ne of the major influences 
on both tongue and Jaw positions for a particular segment. The observations 
cited above suggest that either of two patterns of Ungual-mandibular coordi- 
nation Sight be , observed In the face of context-conditioned .variability, 
first. It Is possible that positive covariation between ' tongue and Jaw posi- 
tions will be observed as a function of the coartlculatory context. Second, 
it is also possible that -a compensatory interaction will be observed between 
tongi;e and Jaw positions for a particular segment In response to a coartlcula- 
tory influence of a neighboring segment. This latter possibility is of 
particular interest because it would support the claim (Suasman fc Westbury, 
19d1) that there nay be active respcmses to coartlculatory influences and that 
these active responses cannot be described simply in terms of phonological 
reorganization (i.e., feature-spreading). 




EitwArdsx ConUxtuU firr«cts on Llngual-HmdibuXar Coordination 



Th« pr«»ent experiawnt was designed to exaaine the effects of contextual 
variability on llngual^iiandibular coordination during unperturbed speech. 
ToQgfia blada and jaw poaltiona for /t/ were analyzed in V,CV, utterances in 
whleh tha identities of the preceding and following vowels were systematically 
varied in order to produce systematic variation of articulator positions for 
the consonant. The data were taken from the existing X-ray mi crobeam corpus 
(Miller « 1983). The advantage of this was that it afforded direct observation 
of tongue position over a relatively large number of repetitions (four per 
utterance type), at least in comparison to conventional X-ray studies of 
tongue position during speech. The disadvantage, however, was that the'datl 
of only a single sub^t could be analyzed, given the two criteria that were 
Uaed^' to select the utterances for analysis: one, that the phonetic context be 
comprised of a syllable-initial /t/ preceded by an unstressed but non-reduced 
"owel and followed by jn stressed vowel; and two, that the tongue blade pellet 
be within 10 mm of th% tongue tip. 

In order to examine the fine structure of lingual-mandibular coordina- 
tion, "resultant** movements of the tongue blade (measured in a fixed spatial 
reforence frame) were decomposed into two parts, an intrinsic componflPt'" and a 
Jaw-related component that reflects the fact that the tongue rests on the Jaw. 
Contextual influences on these components could, in principle, result in any 
one of three patterns of tongue-Jaw interaction. First, it is possible that 
there is no systematic relationship ^tween the components of resultant tongue 
blade movement across phonetic contexts. Second, it is possible that the 
tongue blade and jaw covary with a coarticulatory influence in the same manner 
as they covary across different vwel heights. In this case, the tongue blade 
resultant would display as much or more variation in position as its two com- 
p<Mients across different phwietic contexts. Third, Ut is possible that the 
tongue blade and the Jaw respond to a coarticulatory influence aai they do to 
an artificially-induced perturbation or to token-to-token variability; that 
is, one articulator may compensate for a^articulatory influence on the other 
articulator in order to preserve an utTOrance-spccif ic vocal tract shape or 
acoustic goal, e.g., formation and release of the /t/ closure. In this case, 
less variation in position would be observed for the tongue blade resultant 
than for either of its components across different phonetic contexts. 

Method 

* 

Instrumeptation 

The X-ray microbeam system at the University of TORyo (Kiritani, Itch, & 
Fujimura, 1975) was used to track the movement of pellets attached to the 
tongue blade and to a lower front tooth in the x and y dimensions of the 
mid-sagittal plane. The tongue blade pellet placement for this experiment was 
approximately 10 mm posterior to the tongue tip. Pellet positions were 
recorded every 6.8 ras and subsequently synchronized with the simultaneously 
recorded acoustic speech signal. 

S peech Sample and S ub ject 

The utterances examined were six V,CV, types extracted from the following 
stimulus sentences: 

Bea teats it. Ma teats it. 
Baa tots it. Ma tots it. 
Bea tats it. Ma tats it. 

4f1 
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Thus,' the intarTooallc consonant was always a word-initial /t/, the preosdlng 
trowtl was a word-final /!/ or md tht following vowel was /!/, /o/, or 

jOfie adtat feauiU speaker of Aaertoan Bngllsh (Western UHJlslana dia- 
lect) spoke four tokens of each stiaulus sentence. The tokens were produced 
in randcMized order, 

Pata Process ia^ and Analysis 

The axes of the refereiwe fraaie used to record oovetBents of the tongue 
blade resultant and jaw were rotated so that one of the rotated axes would 
correspond to the first principal component of variation for Jaw moveiMNit. 
All analyses were perforaed using this new rotated reference f rase aliipned 
with the primary direction of jaw movement. 

The simplified model of Jaw movement tftat was used to separate resultant 
tongue blade movement into its intrinsic and Jaw*related comp<Mient8 is tAwwi 
in Figure 1. Jaw movement was modeled as pure rotati<m about a hinge axis 
passing through the condyles. Given the relative pellet positions used in the 
X-ray microbeam data acquisition, it was estimated that about 60% of Jaw move-* 
ment was reflected in resultant tongue blade movement.* The mean of the Jaw 
distribution was tak«i as the reference position for the Jaw; Intrinsic 
tongue blade positions were derived on a frame-hy-frame basis by subtracting 
801 of the difference between the observed Jaw po3ition(^and the Jaw mean from 
the tongue blade resultant position. 




Fignre 1. Jaw movement is approximated as simple rotation about a hinge axis 
passing through the condyles, and coordinates of the tongue blade 
dDd Jaw are rotated so that the new vertical axis is parallel to 
^the principal component of Jaw movement. Since the blade pellet is 
about 80t of the distance from the condyle to the Jaw pellet, 80* 
of the vertical displacement of the Jav/ pellet (d) is subtracted 
from the blade's y-coordinate to get the "intrinsic" blade value. 
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The y positions in the new coordinate system of tJhe tongue blade resul- 
tant, the intrinsic tongue blade, and the jaw were measured at four points in 
tiM: acoustic onset of /t/ clos4:k'e; acoustic release of /t/ closure; peak 
tongue blade resultant height for /t/; and peak jaw height for /t/. Peak 
heights were 'dlflned as the highest pellet positions occurring at points of 
zero velocity between the vowel- to-consonant and the* consonant-to-vowel 
transitions. Velocities were derived froa the displacement data by the appli- 
cation of a nearly-equal ripple derivative filter (Kaiser & Reed, 1977). Mean 
displacements of the tongue blade resultant, the intrinsic tongue blade, and 
the jaw, respectively, for the vowel-to-consonant transitions were 10, 7, and 
3 run for the* /it/ gestures and 28, 23, 7 for the /at/ gestures, averaged 
across final vowels. Mean displacements for the tongue blade resultant, the 
intrinsic tongue blade, and the Jaw, respectively, for the consonant-to-vowel 
transitions Were 5, 2, and 3 mm for the /ti/ gestures; 21, 17, and 5 mm for 
the /ta/ gestures; and 18, 12, and 7 mm for the /tae/ ' gestures, averaged 
across initial vowels. The relative timing of the measured events for most of 
the utterances was: acoustic closure, blajje peak. Jaw peak, acoustic re- 
lease.* Figure 2 illustrates the measurement points for one utterance token. 




Figure 2, The measurement points (acoustic closure, blade peak, jaw peak, 
acoustic release) for one utterance token of /atae/ from the sen- 
tence •'Ma tats it." The resultant tongue blade is shown in solid 
lines, the intrinsic tongue blade in dashed lines, and the Jaw in 
dotted lines. 
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The mean heights of the tongue bla<le resultant (solid lines), the 

intrinsic tongue blade (dashed lines), and the jaw (dotted lines) 

are plotted as a function of the preceding vowel at each measure-- 
ment point. 
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Flgur-e ^, The mean heights of the tongue blado reaultant (solid lines), the 
intrinsic tongue blade (daohed lines), and the Jaw (dotted linea) 
are plQtted aa a function of the following vowel at each measure- 
ment point. 
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• ' Results M 

9 

The data are suamarized in Figures 3 and i|. Figure ^shows the nean 
heights of the tongue blade resultant, the intrinsic tongue blade, and the Jaw 
plotted as a function of the preceding vo»*el at each oeasurenent point. Fig- 
ure 4 shows the nean heights of the tongue blade resultant, the intrinsic 
tongue blade, and the jaw plotted as a function of the following vowel at each 
oeaaurenent point. ^ 

In order to assess the magnitude of the effects of the preceding and 
following vowels, a series of two-way analyses of variance were perfomed 
individually for the resultant tongue blade, the intrinsic tongue blade, and 
the Jaw, using the four ineasupement points. The results of these 16 analyses 
revealed that the effects of the preceding and following vowels are time-de- 
pendent? that is, the main effects of the preceding vowel are significant at 
acoustic closure (£ < .001 for the resultant tongue blade and the intrinsic 
tongue blade) and at blade peak (£ < .001 for the intrinsic tongue blade and 
the jaw; £ < ,01 for the resultant tongue blade)# but not at acoustic re- 
lease. Conversely, main effects of the following vowel are significant at 
acoustic release (£ < .001 for the resultant tongue blade, the intrinsic 
tongue blade, and the jaw), but not at acoustic closure. These findings 
corroborate the reaults of previous experiments (e.g., Barry & Kuenzel, 1975; 
Butcher & Welher, 1976) and support the hypothesis that movement toward the 
post-consonantal vowel is not initiated until after consonant closure, as was 
proposed by Cay (1977). One inconsistency with the previous experiments, how- 
ever, is that one can identify an Influence of the preceding vowel at acoustic 
release by the significant Interaction between V, and V, for the tongue blade 
resultant. This interaction Is displayed in Figure 5; the mean heights of 
the tongue blade resultant are plotted for each V,-V, combination at this 
measurement point. An analysis of this Interaction revealed that the V2 /ae/ 
was the sole basis for this significant effect. Because /«/ was not used in 
the VCV utterances of the previous experiments, such an effect could not be 
observed. 




/</ /o/ /*/ 



Figure 5. The mean heights of the tongue blade resultant following /a/ (solid 
lines) and following /i/ (dashed lines) are plotted as a function 
of the following vowel at acoustic release. 
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r Sisnlfloant aialn effects were examined at each measurement point in order 
to deteraine if coapensatory interactions occurred between artlculatory conpo- 
iwnta as a function of phonetic context. An interaction was considered 
behavioraUy salient if it fulfilled two conditions: one, the main effect was 
statistically significant' for both artlculatory coaponents; and, two, the 
direction of the effect was different for the two components for at least one 
level of that factor. Given these criteria, two instances of compensatory be- 
havior between the components of tongue blade movement were Identified: one, 
at blade peak for carryover » influences; and, two, at acoustic release for 
antigipatQry influences. Of' course, perfect compensation would yield tongue 
blade resultant positions that remained Invariant across all changes in 
phonetic context. While the observed compensatory patterns did not produce 
such an absolute invarlance, they did serv« to reduce the nange of variation 
in the resultant tongue blade position; Let us consider these two instances 

,of conpensation separately. 

Carryover coarticulatory influences are illustrated in Figure 3. Consid- 
er the second measurement point, blade peak, where a compensatory relationship 
between Jaw and intrinsic tongue blade raovtfthents was observed. In this graph, 
t|ie height of the intrinsic tongue blade varies directif with the height of 
the preceding vowel: it is 2.5 mm higher after /!/ than after /a/ (£ < .001). 
The Jaw, by contrast, varies inversely with the height of the preceding vowel: 
it is 1.2 mm lower after /i/ than after /a/ (p < .001). The r^t effect of 
this interaction between the intrinsic tongue blade and the jaw is that the 
tongue blade resultant displays less variation in position (1.1 mm) as a func- 
tion of the preceding vowel than does the intrinsic tongue blade. 

Anticipatory coarticulatory effects are illustrated in Figure ^. Consid- 
er the final measurement point, acoustic release, where another compensatory 
relationship between intrinsic tongue blade and Jaw movements was observed. 
Post-hoc paired comparisons (Newman-Keuls test) revealed that the means of /a/ 
an4 /ae/ are significantly different (p < .Ob) for both the intrinsic tongue 
blade and the Jaw. It can be seen in Figure ^, however, that these two means 
are not significantly different for the tongue blade resultant. This .suggests 
that the tongue and Jaw may also interact to compensate for some, but not all, 
anticipatory influences on /t/ articulation. Th%t is, although the height of 
the resultant tongue blade is strongly influenced by the degt'ee of constric- 
tion for the following vowel (i.e., whether it is high or low), the tongue-Jaw 
Interaction serves to reduce the effect of the location of this constriction 
(i.e., whether it is front or back). j 

Discussion 

The results presented here come frojn the data of a single speaker produc- 
ing only four repetitions of six utterance types. Given' the ubiquitous intra- 
and inter-speaker variability that has been found in speech production re-' 
search, these findings should be interpreted cautiously. Nevertheless, these 
results suggest that the lingual-mandibular complex responds to some coarticu- 
latory influences in the same manner as It responds to artificially-induced 
perturbations and to t^k en- to- token variability. That is, the tongue and the 
Jaw may interact in a compensatory fashion, presumably in order to achieve a 
common goal. Given the data under consideration, it is unclear how to charac- 
terize this goal. One possibility is that these tongue-Jaw interactions arc 
Instartces of compensation in order to preserve' a target articulation, defined 
In its most r^rrou sense. Even though vocal tract occlusion for /t/ is accom- 
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pllahed by the tongue tip, rather than the tongue 4>lade, the position of the 
tongue blade Is constrained in that it cannot fall outside the range of posl* 
tSona that perait tof^fue tip contact with the hard palate. 

Another poaslbllitJt is that the intrinsic and the Jaw-related coaponents 
of tongue blade resultant position are coordinated in order to decrease the 
range of varlatlon-^n the foroant transitions during the formation and release 
of the stop closure. While vocal tract occlusion for /t/ Is accooplished by 
the tongue tip, tongue blade position influences the shape of the cavity be- 
hind the occlusion during the final portion of the transitional sovement froa 
vowel-to-consonant and during the initial portion of the transitional «ove«ent 
from consonant-to-vowel. A coQsequence of reducing spatial differences in 
tongue blade resultant position may be to reduce acoustic variation according-' 
ly. This dosj^ not deny the fact that the acoustic trinsitlons vary as a func- 
tion of the 'Preceding and following vowels. Rather, it suggests that the ob- 
served range of variation may be less than what would occur in the absence of 
these tongue-Jaw interactions. This Interpretation suggests a line of further 
research. 

Whatever the interpretation, these results provide an example of compen- 
satory Inter-art iculator coordination tn response to contextual Influences. 
Although the data presented here are limited In scope, the results support the. 
hypothesis that observed Ungual-mandibular linkages during movement extend 
beyond a , simple mechanical connection between the Jaw and the tongue blade. 
Inter-artlculat9r cooperation, at least for alveolar consonant production, ap- 
pears to be coordinated to reduce positional variation In resultant tongue 
blade height generated by the coartlculatory context. The generality of this 
result, as well as a more detailed description of the conditions under which 
it is observed, remains to be determined. 
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Footnotes 

*This model Is, of course, physiologically inaccurate in that jaw move- 
ment during speech includes both rotation and translation (Gibbs & Messerman, 
1972).. However, at the- level of analysis reported here, the results do not 
depend on whether the calculation of the Jaw component is based on a purely 
rotational model or on a combined rotation and translation model. 

*It should be noted that absolute timing (i.e., the durations between 
each of the measured events) differed systematically as a function of phonetic 
context. However, a detailed analysts of these differences is beyond the 
scope of this paper. * 
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THE TIMIMC OF ARTICULATORY GESTURES: EVIDENCE FOR RELATIONAL INVARIANTS* 



B«tty Tullert and J. A. Scott KeXsott 



Atetract . In this experlB»nt we exarained the effects of changing 
speaking rate and syllable stress on the space-time structure of 
articulatory gestures. Lip and Jaw movements of four subjects were 
nonitored during production of selected bisyllablc utterances in 
which stress and rate were orthogonally, varied. Analysis of the 
relative timing of articulator movements revealed that the time of 
onset of gestures specific to consonant articulation was tightly 
linked to the timing of gestures specific to the flanking vowels. 
The temporal stability observed was independent of large variations 
in displacement, duration, and velocity of individual gestures. The 
kinematic results are in close agreement wit-h our previously report- 
ed EMC findings (fuller, Kelso, k Harris, 1982a) and together pro- 
vide evidence for relational invariants in articulation. 

A central goal for speech research is tJ understand^^he perceptual con- 
stancy of a given unit (e.g., feature, phoneme, syllable) in the absence of a 
unique set of acoustic or articulatory properties. For example, linguistic 
constraints, such as phonetic context, level of stress, and speaking rate, 
produce a wide range of articulatory patterns for the same abstract linguistic 
ptype. The approach that we adopt here is to ask whether constancies in rela- 
tional aspectJs of articulatory patterning (relational invariants) can in fact 
be observed across these speech-relevant transformations. The present work 
explores the possibility that the relative timing ©f articulatory gestures 
spanning several segments Is maintained over suprasegmental variations in 
stress and speaking rate. 

Our interest in the theory toat relatiooril invariants (Kel30, 1981) arc- 
essential to speech communication is motivated by research from thren 
, disparate sources. First, in nonapeech motor skills such as bimanual coordi- 
nation, handwriting, typewriting, postural control, and locomotion, the rela- 
tive timing of kinematic or electromyographic events is maintriined .u-ross sca- 
lar changes In rate and fort-e productj-on (sec for review, Kelso, luiler, h 
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Harris, 1933). For example, as a cat walks faster, the duration of the 
••step^cycle" of each iirab decreaaes and the propulsive force produced by limb 
extension increases (Grillner, 1975; Shlk & Orlovskii, 1976)- However, the 
timing of activity in the limb extensor muscles is constant relative to the 
time between successive flexions (Engberg & Lundberg, 1966)/ 

A second source of motivation for examining relational invariants is the 
demonstration that perception of certain linguistic distinctions depends on 
the relative (not absolute) durations of acoustic constituents. For example, 
perception of the voiced/voiceless distinction in medial stop consonants is 
strongly influenced by the duration of silence (closure) preceding release of 
the consonant. However, Port (1979) found that the duration of silence neces- 
sary to specify that the medial stop consonant was voiceless decreased as 
speaking rate increased (cf. Miller & Grosjean, 1981; Miller & Liberman, 
1979; Pickett & Decker, I960; Sumraerfield, 1975)* 

A third motivation for our approach comes from investigations of speech 
production. These studies, though few in number, suggest that the relative 
timing of artlculatory kinematics at the segmental and syllabic levels is 
unaffected by suprasegmental variations (e*g*, Kent & Moll, 1975; Kent & Net- 
sell, 1971; Kozhevnikov & Chistovich, 1965; LOfqvist & Yoshioka, 1981). 

In an earlier electromyographic study (Tuller, Kelso, & Harris, 1982a), 
we asked whether stable relative timing across suprasegmental variation is al- 
so an appropriate characterization of intersefflnerital speech organization. 
Specifically, we asked whether the muscle activity underlying production of 
the vowels and medial consonant in utterances such as /pilpap/ and /pa#pap/ 
would maintain any *"^mporal systemat icity across rates and stress levels. Our 
strategy was to define periods of muscle activity corresponding to the inter- 
val between successive vowels, and successive consonants. We examined the 
timing of various aspects of muscle activity for the intervocalic consonant 
relative to that for the vowel interval, and the timing of muscle events for 
interconsonantal vowels relative to the consonant interval. Comparing the 
stability of these various timing relations, we found one very consistent re- 
sult; The average duration of the interval between onsets of muscle activity 
for successive vowels was linearly related to the average latency (relative to 
the first vowel) of medial consonant-related muscle activity.' Other possible 
relationships, such as those based on periods of muscle activity related to 
production of successive consonants, did not show the same degree of stabili- 
ty. 

One shortcoming of our electromyographic experiment (Tuilcr et al., 
1982a) is that we could only examine the stability of relative artlculatory 
timing on the averaged ensemble of tokens. We could not examine whether the 
relationship also holds when token-to-token variability is allowed because it 
is not always possible to define onsets and offsets of muscle activity for 
individual repetition tokens of an utterance (see Baor, Bell-Berti, & Tuller, 
1979, for a discussion of temporal measures erf Individual vs. averaged EMG re- 
cords). Moreover, the eventual goal is to understand the speech signal as 
structured by movements of the articulators, but the general form qf the rela- 
tionship between electromyographic signals and kinematic variables is by no 
means transparent. For these reasons, we performed a similar experiment in 
which articulator movement trajectories wore measured and their rel^itive tim- 
ing examined. 
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Method 

Subjects ^ 

The subjects were three adult females and one adult male. All were na- 
tive spe&kers of English. One subject (BT) was aware of the experiment's pur- 
pose. 

jteterials and Procedure 

The speech sample Included utterances of the form b--vowel-conso- 
nant-YOwel-b with the medial consonant presented and spoken as the first ele-- 
ment of the second syllable. Consonants and vowels were chosen to maximize 
lip and jaw movement. Thus, the first vowel (VI) was either /a/ or /aD/, the 
second vowel (V2) was always /a/, and the medial consonant (C) was either /b/, 
/p/, /w/, or /v/ (e.g., /ba#wab/, /bae#pab/, etc,). Each utterance was spoken 
with two stress patterns, with, primary stress placed on either the first or 
second syllable. The subjects read quasi-random lists of these utterances at 
two self-selected speaking rates — one conversational and the other somewhat 
faster. Each utterance was embedded in the carrier phrase "It*s a 

again" to reduce the effects of initial and final lengthening and 

prosodic variations. Three subjects produced twelve repetitions, and one sub* 
ject (BT) ten repetitioni3, of the 32 utterance types (8 phonetic strings x 2 
rates x 2 stress patterns), for a total of U72 tokens. 

Data Recording 

Articulatory movement in the up-down direction ,was monitored using an 
optoelectrenic device (a modified SELSPOT system). In this system, light- 
weight, infrared, light-remitting diodes (LEDs) are focused on a photodetector 
that, with the associated electronics, outputs analog signals corresponding to 
the X and y position of each Lf D over tiirie. In this experiment, the LEDs were 
attached to the subject's upper lip, lower lip. Jaw, and nose. In order to 
minimize head movements during the experiment, a head rest was used and output 
of the LHO attiiched to the nose was continuously displayed on an oscilloscope 
placed directly in front of the subject, who was told to keep the display on 
the zero line. 

Acoustic recprdings wpf*e made simultaneously with the movement tracks and 
both were eomputer-an:ilyzed on subsequent playback from FM tape. Acoustic to-- 
keno were first excised from the carrier phrase using the PCM system at Has-- 
kins Laboratories, then played in random order to four listeners who judged 
each tokpn'3 phon^^tic make-up and stress pattern. Tokens were cxr^itted from 
further analysis if more than one listener judged the token as having a dif- 
ferent stress pattern from the appropriate one or if any phonetic error:^ were 
noted. For only one speaker (JE) was it necessary to omit more than two to- 
kens of any given utterance^ type; the minimum number of utteranc- tokens for 
this speaker was seven* 

The movc^nent record3 were computer''3an;)led at b-^ms intervals. To correct 
for up-'down head movements^ output of the nose LED was subtracted (by a 
computer program) from tri^- output of the LEDs att-jched to the lips and jaw. 
Movements of the lower lip were loolated by subtracting movements of the jaw. 
Velocity records for the jaw, upper lip, lower Hp, and lower lip corrected 
for jaw movement were obtained by software calculation of the f;rst derivative 
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of the poditlon records. For each token, the times at which movements began 
and ended (indexed tjy points of zero velocity) were obtained individually for 
the Jftw, the upper lip, and the lower lip corrected for Jaw movement. 

Results 

The main thrust of this study was to examine the relative timing of 
articulatory movements. In keeping with our tearlier work and with various 
studies of nonspeech motor skills, we chose to define articulatory timi/ig in 
terms of the phase relations among events in the movement trajectories. This 
requires delimiting some period of articulatory ^tivity and the latency of 
occurrence of an articulatfJry event within the cJefined period. Over linguis- 
tic variations, in this case stress and rate, these intervals will change in 
their absolute durations. The question is whether they chan£^ in a systemati- 
cally related manner. 

Our earlier electromyographic fetudy (Tullgr, Kelso, & Harris, 1982a) 
showed this maximal temporal systematiicity when the latency of onset of^ conso- 
nant-associated muscle activity was /considered relative to the period between 
onsets of muscle activity associated with production of successive vowels. We 
used this result to guide our investigation of articulatory kinematics, al- 
though the latencies of gestures a^^ociated with vowel events were also exam- 
ined relative to the interval between gestures associated with successive con- 
sonant productions. 

Figure 1 shows the acoustic Signal and position-time functions for the 
Jaw, upper lip, and lower lip (independent of jaw movement) for one token of 
/bafpab/, spoken with primary altness on the second syllable. The figure 
Illustrates the articulatory intervals discuss'ed in th st of this article. 
In all cases, the onsets of articulator movements (A through F in Figure 1) 
were determined empirically from zerc crossings in the velocity records of the 
individual repetitions (not shown in Figure 1). Points labeled A and B are 
*the onsets of Jaw lowering' associated with production of the first and second 
vowels, respectively. The interval from A to B is referred to hereafter as 
the "gestural cycle associated with production of successive vowels** or, more 
loosely, the "vocalic cycle." Similarly, the intervals from C to D and fran E 
to F are referred to as "gestural cycles associated with production of 
successive consonants" or "consonaiital cycles," indexed by movement onsets of 
the upper lip and lower lip, respectively. Within the vocalic cycle of each 
token, we measured the latency of onset of consonant-related movement of ,the 
upper and lower lips (i.e., the intervals A-C and A-E), Within the consonant 
cycle of each token, we determined the latency of onset of jaw lowering 
associated wi^th vowel articulation (C-B and E-B). 

One kinematic measure that is Intuitively commensurate with the temporal- 
ly stable EMC measure is the latency of onset of lower lip raising for produc- 
ing the medial labial consonant (A-E) relative to the vocalic cycle (the peri- 
od from the onset of Jaw lowering for the first vowel to the onset. of jaw 
lowering for the second vowel (A-B)). ^ These measures are illustrated quanti- 
tatively in Figure 2, which shows measurements for one subject'-s iJE) produc- 
tions of the utterances /ba#bab/, /bd#pab/, /balvab/, and /ba#wab/. Each 
point on a graph is one token of an utterance type, and the four stress-rate 
conditions are plotted on a single graph. 
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Figure 1. Movements of the Jaw, upper Hp, and lower lip corrected for jaw 
movement-, and the acoustic signal for one token of /t>a#'pab/. 
Articulator position (the y-axis) is rhown as a function of time. 
Onsets of jaw and lip raovetnents (en?)incally determined from zero 
crossings in the velocity records) are indicated (see text for de- 
fails). 



A Pearson product-moment correlation was calculated for each distribu- 
tion. Obviously, the calculated correlations are very high: .93, .92, .9ii, 
and .92. However, the changes that occur are not ratiomorphic; the calculat- 
ed regression lines (not shown in the figures) do not intercept the y~axis at 
the origin. Utterances with /ae/ as the first vowel showed essentially ident- 
ical results, with correlations for this speaker of r - .9 and above. Again, 
th-^ -.ha'-ges were systematic but not ratiomorphic. 

igure 3 also shows the timing of medial consonant articulation relative 
le vocalic cycle for the same subject in Figure 2. In this case, however, 
we have defined the onset of consonant articulation as the onset of the lower- 
ing gesture of the upper lip (interval A-C in Figure 1). Utterances with me- 
dial /V/ are not includec* because no systematic upper lip movement was noted. 
Again, the changes in duration of the two measured intervals are highly 
correlated for utteranc s with /a/ as the first vowel (shown in Figure 3) as 
well as in utterances ^hose first vowel was /as/. It can be seen from the 
figures, however, that correlations within each stress-rate condition tend to 
be lower than the correlations across conditions, particularly in those condi- 
tions whose range is small along one axis. 
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Figure 2, Tiroing of lower lip raising associated with cnedial consonant arti- 
culation as a function of the vocalic cycle for one subject's (JE) 
productions of /ba#Cab/ utterances. Filled circles are tokens spo- 
ken at a conversational rate with primary stress on the first syll- 
able; open circles are tokens spoken at a conversational rate with 
stress on the second syllabi.©; filled triangles are spoken at the 
faster rate with primary stress on the first syllable; open trian- 
gles are the faster rate, stresr. m the s-eoond syllable. 
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. Figure 3. Timing of upper lip lowering associated with medial consonant artl- 

. culation as a function of the vocalic cycle for one subject's (JE) 

I productions of /ba#CaD/ utterances. Symbols as i.i Figure 2. ♦ 
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Although Figures 2 and/ 3 illustrate the data frc^i only a single subject 
(JE),' the three other subjects showed essentially the same pattern. The left 
tuilf of T«ble \ shows the values for all four subjects obtained by correlating 
the vocalic cycles with the latency of onset 'of consonant articulation. 
Correlations obtained when consonant articulation is defined by the raising 
gesture of j the lower lip are shown separately from correlations in which &on~ 
sonant articulation is defined by the lowering gesture of the upper lip. The 
lowest correlation obtained for any utterance was .8i< (accounting for 71 J of 
the variance). Let us underscore that these high correlations occur even 
though other aspects of the movements, such as their displacement, velocity, 
and duration, change substantially (Tuller, Harris, & Kelso, 1982). The right 
half of Table 1 shows the correlations obtained between the within syllable 
consonantal cycle and the latency of production of "the intervening vowel. In 
Figure 1, titoese measures correspond to the intervals C-D and C-B for the upper 
lip and Jaw, and E-F and E-B for the lower lip amd Jaw. The resulting corre- 
lations span a wide range of values (from -.02 to .72), clustering in the ,2 
to .65 range. 

One question that arose fro© this analysis was whether the high correla- 
tions obtained between the duration of the vocalic period and the timing of 
the TOdial consonant could be a statistical artifact. Most of the durational 
stretching and shrinking across rate and stress changes occurs in the 
vowel-related^ articulator movements. This alone might account for the fact 
that the correlations between two intervals that both contain the vowel-relat- 
ed soveoients are higher than the correlations between intervals not containing 
this conjmon element (cf» Barry, 1983; Tuller, Kelso, & Harris, 1983)- 

To explore this possibility we determined the correlation coefficients 
that would occur if consonant gesture latencies occurred at random with re- 
spect to gestural periods for successive vowels. To this end, we subtracted 
the latency (A-C or A-E in Figure 1) from the period (A-^B) for all Individual 
tokens of an utterance type. The resulting values (C-B or E-B) were then ran- 
domly paired with a different latency value. Adding the members of a pair re- 
pairs of values have the same property as our original measure: variability 
in vowel duration contributes both to period and to latency. We then 
calculated the correlations between the new pairs. Using Fisher's r-to-z 
transform and t-tests, we compared the new correlations with the original 
correlations obtained from the period and latency pairs as measured from the 
data. Figure H shows the difference betweer. the z-score for the actual corre- 
lation and the z-score for the correlation obtained with random pairing of pe- 
riods and latencies for the 56 comparisons, * In all ca^es, the correlation 
obtained from the randomly paired periods and latencies was significantly 
lower (at least at the .05 .level) than the correlation of periods and 
latencies that actually occurred, 

A related question Is whether our results are due to an overall tempo ef- 
fect (MacNellage, In press) and thus do not specifically duplicate the gestur^ 
al cycle for vowels as an Important variable In speech motor control. We 
tested this possibility by examining the Interval from the onset of jaw lower- 
ing for the second vowel to the onset of upper lip lowerijig for the final con- 
sonant (Interval B-£) In Figure 1) relative to the Interval between Jaw lower-- 
Ing for successive vowels (Interval A-B), No.tice that in this analysis, the 
defined cycle does not Include the relevant consonant-^related articulation. 
Nevertheless, these variables should still be strongly correlated If an over- 
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Table 1 

Pearson Product -Moment Correlations for All Four Subjects, Describing Rela- 
tionships Between Various Periods and Latencies, as Indicated 







Vocalic 


Cyoie 






Consonantal 


Cycle 






aba* 


aeba* 


aba* 


aeba^ 


aba* 


aeba' 


aba* 


aeba* 


CH 


.93 


.91 


.98 


.97 




.02 


.^9 


.13 


NM 




.89 


.92 


.9^ 






.28 


.62 


JE 




.90 


.97 


.90 


.63 


.55 




22 


BT 


.95 


.95 


.96 


.93 


.52 


.61 








apa 


a&pa 




a?pa 


apa 


aepa 


apa 


aepa 


CH 


.96 


.87 


.95 


.97 


-.02 


.35 


.22 


.26 


NN 


.93 




.91 


.92 




.22 


.61 


-.02 


JE 


.92 




.97 


.89 


.39 


.29 


-36 


.en 


BT 


.97 


.96 


.96 


.93 


.71 


.31 




.21 




awa 


aewa 


awa 


aewa 


awa 


aewa 


awa 


aewd 


CH 


.91 


.95 


.91 


.^0 


.71 


.31 


.61 


.OB 


NM 


.93 


.91 


.95 


.9^1 


.51 


.51 


.^3 


.69 


JE 




.92 


.89 


.8H 


.2i< 


.72 


.37 


,05 


BT 


.97 


.93 


.91 




.33 


.38 


.51 






ava 


aeva 






ava 


aeva 






CH 










.69 


.21 






NN 


.86 


.89 






.51 


.63 






JE 


.92 


.95 






.ne 


.5? 






BT 


.96 


.90 






.56 


.33 







*Latency - VI (jaw) to medial C 

'Latency - VI (jaw) to medial C 

•Latency - C2 (lower lip) to V2 

"Latency - C2 (upper lip) to V2 



(lower lip); period = VI to V2 (jaw), 

(upper lip); period ^ VI to V2 (Jaw), 

(jaw); period = C2 to C3 (lower lip), 

(jaw); period C2 to C3 (lipper- lip). 
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Z - Z 

ACTUAL RANDOM 

FlaM'"^ Differences between z-scores for the "actual" correlations^ and 

z~3Core3 for the correlations obtained by random pairing of periods 
and latencies. 



all tempo effect is involved. The resulting linear correlations; however, 
were extremely weak, ranging f r ^tj -.6 to ,02 across the four speakers, and 
clustering (83*) in the -.1 to ~ .k range. The correlations were generally 
negative because stressed and unstressed syllables alternate in ouf data set. 
Thus, long vowel intervals (utterances with the first syllable stressed) arc 
followed by short lip closing gestures (unstressed, syllable-Initial conso- 
nants). Taken together with the results of randomly pairing periods and 
latencies these results indicate that neither variations in vowel duration nor 
overall speech tempo can account for the systematic relationship between the 
timing of intervocalic consonant articulation and the period between its 
flanking vowels. 

Another prediction of the stable relative timing of consonant and vowel 
articulations is that the small changes in duration of consonantal gestures 
should be correlated with the relatively larger changes in duration of 
vowel-related gestures. To explore this further, we determined the duration 
of "vowel-specif Lc movement," defined as the interval from the onset of jaw 
lowering for the first vowel to the onset of lip inoveroent for' the raedial con- 
sonant (A-C and A-E in Figure 1), and the duration of "consonant-specific 
movement," defined as the interval froa the onset of lip movement for the me- 
dial consonant to the onset of Jaw lowering for the second vowel (C-B and E-B 
in Figure 1 ) . , We then correlated these measures across stress and rate condl- 
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tions for each utterance type and, using t-tests, determined whether the 
resulting correlations were significantly greater than zero. In all 56 cases » 
the durations of consonant and vowel novements (as defined above) were 
positively correlated (rs ranged from .52 to .87, ts ranged from 3.55 to 
10.29, £s < .01). 

Although the above analyses examine the commonalities in organization 
across dlsyllabJhes with different intervocalic consonants, we expected to ob- 
serve consonant-related differences predictable from the acoustic-phonetic 
literature. For exa^le, the period of voicing for a vowel prior to 
supraglottal occlusion for a voiced stop consonant such as /b/ tends to be 
longer than voicing for the same, vowel before closure for the voiceless stop 
consonant /p/ (s.g.. House, 1961; House & Fairbanks, 1953; Peterson & 
Lehlste, I960). For the four speakers in this study, the acoustic duration of 
the voiced portion of each vowel was measured and ANOVAs computed to test the 
effect of consonant i/p/ vs. /b/), stress, and speaking rate on vowel-related 
voicing duration. The acoustic measures were from the first full pitch period 
after initial consonant release to the first acoustic indication of closure 
for the medial stop consonant. ANOVA revealed that all four speakers produced 
significantly longer voicing for vowels before /b/ than before /p/ (Fs (1,59) 
ranged from 39.02 to 78.61, £s < .001), although for one speaker (CH)~this ef- 
fect was rather small (22 ms mean difference), possibly because the medial 
consonant was not syllable final. 

In light of these results, one might predict that the latency of conso- 
nant articulation relative to the preceding vowel (as indexed, for example, by 
the onset of lower lip raising) would occur later In /b/ than In /p/. Exami- 
nation of Figures 2 and 3 reveals, perhaps surprii^ingly, that the range of 
latencies for the onset of lower lip movement changes only slightly across 
Intervocalic consonants. Although the mean latency values within each 
stress-rate condition tends to be later for /b/ than for /p/, this small 
difference does not account for the total measured acoustic difference. The 
onset of upward Jaw movement, however, does migrate with context, being 20 ms 
to HO ras earlier In vowel-/p/ than vowcl-/b/ utterances. 

Another hypothesis is that the period-latency functions might reflect the 
manner of consonant production. In fact, the calculated regression lines (not 
shown in the figures) for /v/ and /w/ did tend to have flatter slopes, 
reflecting earlier artioulatory onsets, than the regression lines for /p/ and 
/b/. However, the ordering of slopes Is not Identical across subjects. He 
also evaluated consonant effects on the duration and peak Instantaneous 
velocity of upward movements of the composite lower llp-Jaw system, A signif- 
icant consonant effect was found for both the duration and velocity of lower 
lip movements for all speakers (Fs( 3,2*10) ranged from 6.86 to 351.8, £s 
<.001), Scheff^ post-hoc comparisons showed that for three of the four speak- 
ers, the duration of the lower lip gesture upward was longer for vowel-/v/ and 
vowel-/w/ transitions than for vowel-/p/ and vowel-/b/ transitions (£s <,05), 
In addition, the peak Instantaneous velocity' of the composite lower llp-Jaw 
system for all speakers was higher for vowel-/p/, vowel-/b/, and vowel-/v/ 
gestures than for vowel-/w/ gestures (£s <.0'?). Although the difference In 
peak instantaneous velocity for vowel-/p/ and vowel-/b/ gestures was just 
Short of significance at the .05 level, all four speakers showed a tendency 
for vowel-/p/ gestures to have higher velocities than vowel-/b/ gestures (see 
also Kuehn, 1973), 
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Discussion 



To suonuirlze. In this experiment the tiaing of movement onset for ges" 
tures appropriate to consonants was tightly linked to the timing o^iovement 
onsets for vowel-related gestures.* This stability of relartive arfHsulatory 
timing was observed for all utterances examined and was independent of often 
large variations in duration, displacement, and velocity of individual articu- 
lators. Moreover, performance of the one speaker who was aware of the "Experi- 
ment's aim was in all ways similar to the performance of the three naive 
speakers. These kinematic results are compatible with the earlier»pMG find- 
ings (Tuller, Kelso, & Harris, 1982a) and together, we feel, provide evidence 
for relational invariants in articulation. Nevertheless, a few caveats are in 
order. 

First, the measure of movement onset is not meant to be isomorphic with 
the measure of EMC onset in the earlier experiment. The relationship between 
parameters of muscle activity and the resulting - kinematics has yet to be 
elucidated in systems far less complex than the vocal apparatus (e.g., Bigland 
& Lippold, 195^; Cooke, 1980; Wallace & Wright, 1982). 

e 

Second, we have chosen to exainine the relative timing of onsets of move- 
ment trajectories but do not mean to imply that movement onset enjoys privi- 
leged status as a directly controlled variable. A good deal of debate in the 
motor control literature surrounds the question of what variables the nervous 
system regulates (cf. Stein, 1982, and coimnentaries) . Nevertheless, we feel 
confident that tjje timing of onset q£ articulator movement is highly correlat- 
ed with whatever kinematic or dynamic aspects of movement are apposite to the 
nervous system.- 



A third reason for caution when generalizing these results is that we^id 
not examine the behavior of the most important articulator, namely, the 
tongue. Although yie expressly restricted our corpus to consonants having min- 
imal tongue involvement (so far as we know), any adequate account of speech 
motor control must include a description of lingual articulation. These data 
are buttressed, however, by results of a recent, but more limited, parallel 
experiment that monitored tongue movements of one speaker (Harris, Tuller, h 
Kelso, 1983; see also" Ostry, Keller," Parush, 1983; Far iish", Ostry, i Wun- 
hall, 1983). 

Fourth, we have only examined phonetically very simple material — the be-- 
havlor of dingle consonants : etween two fairly unreduced vowels, with the 
Intervocalic consonant in syllables-initial position. The description is 
Incomplete in that it does not address the syllable affiliation of the conso-- 
nant^ the number of intervocalic consonants, the role of extremely reduced 
vowels or schwa, or cases where ex'tensive anticipatory co^art ioulat ion is 
possible. 

^ Despite these limitations, the view that the period between successive 
vowel gestures Is i significant articulatory event .md tliat oc nsorvant gestures 
are timed relative such periods is supported Dy the iiter/ituro on compens^i- 
tory shortening and coartuMjiatlon. For example, it is well known that 
intervocalic consonants shorten the measured acoustic durrition of the 
surrounding vowels (e.g., Lindbiom & Bapp, 1973). This may rtr-an tnrit all as- 
pects of the articulation of vowels are shortened when consonants follow or 
precede them. Alternatively, it may mean that the consonants and vowels ^re 
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produced In concert, with the trailing edges of the vowels progressively 
••overlaid, " as it were, by the consonants (Fowler, 1981). In this view, vowel 
ftrticulations occur continuously throughout the production of consonants and 
consonant clusters. An articulatory organization of this sort was flVst pro- 
posed by Ohman (1966), to explain the changes in formant transitions of 
intervocalic consonants as a function of the flanking vowels. Fowler\(1977) 
has elaborated this view by suggesting that the vocalic cycle plays an impor- 
tanjt organizing role in speech production and perception. Mcf^ recent articu- 
latory evidence that the influence of both preceding and following vowels is 
^-apparent throughout the intervocalic consonant might also be interpreted as 
'indicating a significant vowel- to- vowel articulatory period (Barry i Kucnzel, 
*l97b; Butcher & Weiher, 1976; Gay, 1977; Harris & BeU-Berti, ]9fi^i Suss- 
man, MacNeilage, & Hanson, 1973). 

In conclusion, we believe that the data in the study reported here indi- 
cate an organizational scheme that speech production shares with many other 
forms of coordinated activity (see Boylls, 1975; Fowler, Rubin, Remez, & Tur- 
vey, .1980; Grillner, 1982; Kelso i Tuller, 198i<; Kelso, Tuller, & Harris, 
1983; Turvey, Shaw, i Mace, 1973, for reviews), characterized by the temporal 
stability of movements rj^tive to a cycle and the independence of the rela- 
tive timing of movements from modulations in displacement or force. In fact, 
this appears to be one of the main signatures of muscle-joint ensembles when 
they cooperate to accomplish particular tasks. 
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^This r'L'Sult has sinc^ oeen replicitocl for ^poaK^^r:: of Kronc^n, ij.sing i 
somewhat more extended phonetic inventory ani rpuncle set (Centil, Harris;, 
Horiguchi, & Honda, 198^1). 
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*Data from a different speaker (CH) are plotted in Tuller, Kelso, and 
Harris (1982b), and a subset of data from a third speaker (BT) is plotted in 
Tulier t Kelso t and Harris (1983)* 

•Four subjects X six utterance types X two measures of consonant articu-^ 
lation, plus four subjects X two utterance types with one measure of consonant 
articulation. 

''Recent work by Lubker (1983) suggests that for speakers of Swedish, the 
timing of vowel and consonant movements is constrained as for the English 
speakers. 




ONSET OF VOICING IN STUTTERED AND FLUENT UTTERANCES 
Gloria J. Borden^ t Thomas Baer, and Mary Kay Kenneytt 



Abstract. Electroglottographic (EGG) and aC3ustic ' waveforms ot the 
first few glottal pulses of voicing were monitored and voice onset 
time (VOT) measured during an adaptation task performed by stutter-- 
ers and controls. The fluent utterances of stutterers resembled 
those of control subjects. After dysMuencies , however^ the EGG 
signal increased gradually, lending physiological tupport to the 
technique of ''easy onset" of voicing. EGG waveforms also served to 
help differentiate mild from severe stutterers. Idiosyncratic ritu- 
alized laryngeal behavior, sometimes including physiological tremor, 
was evident in the EGG record. 

Physiological studies indicate that initiation of voicing presents 
particular difficulties for stutterers. Aberrant laryngeal muscle activity 
(Freeman i Ushijlma, 197B) and inappropriate vocal fold positioning (Conture, 
McCall, & Brewer, 1977) have been found. In addition to abnormally high mus- 
cle activity, Fre-/man and Ushijima found that the usual reciprocity of laryn- 
geal adductor and abductor muscles disappears during stuttering episodes. 
Conture and his colleagues observed that the vocal folds are fixed during 
blocks in eitner a closed or open position. Many methods used to treat 
stuttering accordingly emphasize "easy onset" of voicing. Van Riper's (1963) 
technique of altering the preparatory set directed stutterers to start an 
utterance from a state of rest. Webster's (1974) "Target-based Therapy" and 
Weiner^s (1978) ''Vocal Control Therapy" are two of many approaches that direct 
attention to the gra^lual onset of voicing. Thme tochniques are supported by 
numerous studies demonstrating the fluency enhancing effects of condit io;i;5 
(such as choral reading, delayed auditory feedback, metronome-timed speech, 
and auditory masking) that result in altered phonatory states (Wingate, 1969, 
presents a revi^^w). Also, stuttering episodes were found to become more fre-- 
quent when changed) in voicing were Increasingly required (Adams A Heis^ 197^). 

Even when judgorj to be flu^^-nt, stutterers have boon found to be slower 
than normals in initiating voicing during reaction time experiments (Adams & 
Hay den, 1970; Cross & Luper, 1979; StarKwenther , Hirschman, ^ Tann^^nbaum, 
1976). Voice ons^-t time (VOT) in CV ccjcnbi nat ion^i has also beert found to be 
longf>r in th^ perc^eptual ly fluent utt^^ranoer, vf .^t ut^t^^r^^rs tnan in tok^n):i 



tAlso Tempb* Univ^»^;>i ty , Philadel;>hi/i, PA. 
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uttered by noraal control subjects (Hlllman & Gilbert, 1977), although there 
have been findings that contradict or qualify the longer VOT results (Metz, 
Conture, & Caruso. 1979; Watson & Alfonso, 1983) • The inconsistency of re^ 
suits in the VOT studies may be due to differences in the decree to irhich 
suo-vocal blocks were successfully eliminated from the sar^ple determined to be 
perceptually fluent. Since the incidence of stuttering episodes is known to 
/be significantly higher at the beginning of a phrase than within it (Blood*- 
steln, 1975), the preparatory "set" does seetq to be implicated, but the ques-* 
tion remains whether these preliminary adjustments a^ aberrant In stutterers 
even when they are fluents On average, stutterers are slower In their speech 
than nonstutterers and are also slower in counting on their fingers, but when 
separated into groups according to severity, a significant difference was 
limited to the severe stutterers; mild stutterers were not significantly 
slower than their controls (Borden, 1983). 

The phenomenon of adaptation in stuttering, in which the frequency of 
stuttering episodes is usually reduced in repeated oral readings of the same 
passage, was e;cploltecl In this study to provide examples of fluent and stut- 
tered tokens of an atterance for conparative purposes. In addition, we used 
the technique of electroglottography (EGG) a useful, noninvasive method of 
indirectly examining activity of the vocal folds. The recorded EGG signal Is 
the change in impedance across the vocal folds of an imperceptible high fre- 
quency current passing between electrodes placed on each side of the thyroid 
prominence (Fourcin, 197^). To the degree that the vocal folds increase con-- 
tact with one another, impedance to the transmission of the signal decreases, 
while glottal opening Increases impedance. Thus, vocal fold movements may be 
Inferred from changes in impedance.' Investigations comparing the EGG signal 
with direct filming of the vocal folds have yielded information on landmarks 
of the EGG waveform and their correlation with glottal opening, closing, and 
peak contact (Baer, Lttfqvist, & McGarr, 1983; Chllders, Nalk, Larar, 
Krishnamurthy, & Moore, 1983; Rothenb ;rg, 1981). 

The purpose of this experiment (see Footnote 1) was to study the onset of 
voicing in stutterers and their controls during an adaptation condition for 
which they repeated i4-digit number series (such as ^253) five times each or 
until judged fluent. Questions that we had in mind were: What can be in- 
ferred about voice initiation from acoustic and EGG analysis 



Initiatl(;ri of voicing W'3S inily^^^ri by examining the acoustic and electro^ 
glottogrdphio waveforms of tfu/ first few glottal pulset) of each of the two 
number series and by measuring VOT from .:jpectrographic recordings. 



• c.in stuttered, aborted attenpts to voice 

.•-in successful voicing after a block 

, ,,in perceptually fluent utterances 

.•.in normal speech of control subjects? 
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Method 

Subjects 

Eight adult 8tutverers (seven males and one female) aged 2}-^ yeara were 
matched by sex, age, and general educational /occupaticmal level with eight 
normal speakers aged 20-^45. Mean age was 33 for the experimental group and 32 
for the control group* College students, teachers, blue collar workers^ and 
professionals were represented in both groups* Subjects were biBK)dally 
distributed in terms of- the severity of their stuttering, table 1 shows that 
four of the stutterers v^re rated as mild and four as severe, according to the 
Stuttering Severity Index (Riley, 1972), the reading and conversational parts 
of the Stuttering Interview (Ryan, 197^), and subjective Judgments of two 
speech pathologists. 

Table 1 

Subjects for the adaptation study and their controls* 



SUBJECT SEX A6E SEVERITY 

or 

STUTTERING 





1 


JP 


M 


46 


severe 


a 












o 


2 


DE 


M 


22 


severe 


«». 












O 


3 


DA 


M 


31 


severe 


« 












c 


4 


LB 


M 


44 


mild 














E 


6 


DL 


F 


30 


severe 














a 

K 


6 


MA 


M 


26 


mild 


Ui 














7 


GV 


M 


41 


mild 




6 


SL 


M 


21 


mild 



Si 33 





1 


f S 


M 


45 




2 


TS 


M 


22 


a. 
:j 
O 


3 


SB 


M 


30 


6' 


4. 


ec 


M 


43 


o 

c 
o 


5 




F 


32 


6 


Ji 


M 


29 




7 


AL 


M 


30 




8 


DR 


M 

x = 


20 

32 



Task 

Subjects were asked to count aloud from a visu.^1 digital display of two 
different sequences of the digits 2, 3, i4, and ^. The sequences were 3425 and 
^2 b3 > Subjects were Instructed to say each sequence as quickly as possible, 
without sacrif ic ing accuracy , upon the sound of a response tone. They were 
told to expect repetitions. Each series appeared five times, the first time 1 
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s before the signal to respond, and the last four times simultaneous with the 
signal to respond. If the stutterers were not fluent by the fifth trial of 
each series, they were instructed to repeat the number series until fluent. 
*11 8 of the control subjects. 3 of the ^ mild stutterers, and 1 of the severe 
stutterers repeated each series 5 times for a total of 10 Jtterances from each 
subject. The remaining mild stutterer and three of the four severe stttt^rers 
T-epeatcd each series (H.IO), i)0,2H), and (11,10) times, respective- 
ly. One severe stutterer never fully adapted to the ^^253 sequence after 2H 
trials. 

InstruBientation 

The program presenting the test sequences was run on a microcomputer 
(Integrated Computer Systems). For each sequence, a visual warning signal was 
followed by a variable interval (300, HOO, or 500 ms), after wnich the '♦-digit 
display appeared. The tone signaling the subject to respond w^.s delayed 1 s 
after the first display of each series and was simultaneous with the display 
for the repetitions. Presentation of each display was experimenter-controlled 
to allow for subject differences in response time. 

An electroglottograph (F-J Electronics ApS) recorded rapid changes in im- 
pedance by high pass filtering t^5 Hz-10 kHz) the overall changes in impedance 
of a signal transmitted across the larynx at the level of the vocal folds. 
The onset of these rapid oscillations was abrupt and unambiguous an* served to 
signal the onset of voicing during the adaptation task. The acoustic pressure 
wave was simultaneously recorded through a microphone placed approximately 1 
foot from each speaker. Lip/Jaw movement war? recorded from a small LED, at- 
tached to the lower lip, that was exposed to an opto-electronic tracking 'sys- 
tem; respiratory movements were recorded by a seir.i -hemispheric pneumograph. 
The respiratory and lip/jaw recordings were not analyzed in detail for this 
report ( see Note 1 ) . 

Analysis of the Data 

Visicorder graphr. of the physio log i^-al .jnd a-oust u; ai^-ndls r*-.'orded nn 
FM tape were produced for each subjpot. The adaptation triai recordings WMr-*-. 
inspected for any sikh of dy3flU'?ncy, such as abnormal fljctuations in laryn- 
geal impedance. The trials were then digitized from tho analog tape for 
further editing. The experimenters inspected pach sot of trials (^n .i cornput<->r 
monitor using a 100-m3 time frame to magnify ^he first few periods of rapxd 
vibrations of the vocal folds, enabling a more detailed examindtion of tne 
electrogiottographic and acoustic waveforms. Hard copies w^jr-- mado of tn^^ 
first dysfluent and last fluent utterance for each scries in the samplf^- 
collected from stjtter'ers and of the fir.st, .-ml i.ist trLHl fr',;fT. j-^iibio-t 
who did not stutt^^r. 

In addition to the wavefor-T, r-ccor.j ings, jojnd spr-itrofrrdm;; w-,'r ( pruducod 
for all utterances during the adaptatton series. A total of spectrograms 
were generated to measure VOT of the utterance two., /tu/. VOT was moasured 
from th^' ons'-t of tn" hurst for /%/ to the fi-st glottal pnl-M- for /a/. If 
the utterance was stuttererd by r^^petition nf /t/, the measur.- was taken from 
the last burst _ to vowel onset. Mt-'jsiires in millimeters w.-r*.- conv^-rted to 
raiiliseconoa and averaged for each subject arid across groups corresponding tn 
(1) utterances of control subjects, .{2} fluent utterances of stutterers, and 
(3) stuttererd utterances. Speech rate was measured for the first and Usl 
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rhimt saiaple for each speaker, yielding four measures (two nuwber series) 
that*i#ere then averaged. Measures were Xak^. from the onset of voicing for 
the first syllable to voice offset for the last syllable, thus eU«inating the 
j fg^-^'- • - atfiiguoua onset of the inlfeiai consonant. «» aeasures ia wiUiw^ 

^ . xers were converted to railllseconds and divided by four for an average time 
£ '^or each syllable. This time divided into 1000 ms yielded an average svU- 

K:i;.r_^. , Able/second speech rate, 

. Analysis of the EGG and acoustic waveforms at voice onset was qualita- 

tive. Quantitative measures of VOT differences between stutterers and con- 
trols were averaged across fluent utterances and the standard deviations 
computed. Spearman's rho correlation was- used to test the relationship be- 
-ween VOT and speech rate. , 

Results 

Electroglottoyaphi c and Acoustic Waveforms 

^o"^rPA subjects. The patterns of change in laryngeal impedance recorded 
by the electroglottograph looked similar for' all control subjeSts. Figure 1 
represents the EGG and acoustic waveforms of a male voice initiating /or/ in 
the word four. The polarity of the signal for this Analysis is set so that 
upward deflection indicates the decreased impedance that accompanies Increased 
vocal fold^ontact, and downward d^^flection indicates the increased impedance 
accompanying decreased vocal fold contact. Normally, vocal fold contact in- 
creases more abruptly (a) than it decreases (b). There is a relatively stable 
open phase (c). The EGG envelope grows rapidly in amplitude ( d) relative to 
the typical acoustic wavefoi^^m for a vowel after /f/. a waveform that is more 
gradual in buildup of the erivelope (e). In previous studies, direct viewing 
of vocal fold vibration simultaneous with EGG recordings has established these 
landmarks of the impedance signal fBaer- et al., 1983; Childers et al. , 1983; 
Rothenberg, 198l). It Is difficult to determine the moment of glottal opening 
as the folds peel apart during the downward slope of the signal, although 
sometimes there is a "shoulder^" in the downward slope that corresponds with 

^_ appearance of a glottal aperture. Peak EGG is fairly reliable, however, 

-as -an- -indication OT max imum , voca 1 fold contact, although it does not 
necessarily indicat^^ complete glotta. closure. Occasionally one sees a cycle 
of impedance that does not result in an acoustic pulse. This may reflect some . 
■ prevoicing laryngeal adjustment. 

Stutterers whent fluent . The first finding from inspection of the EGG 
waveforms of stutterers during fluent utterances was that the waveforms looked 
normal, with abrupt cloi^ing, gradual opening, a relatively stable open phase, 
and a rapid buildup of the EGG envelope. Fig-jre ? showr, the waveforms from a 
male stutterer (severe) and hio control and a' r^jm^I^- stutterer (severe) and 
her control. All four samples are from the final trial of the series H253, 
showing onset of voicing in the word four. There is no obvious differencel'n 
EGG and acoustic waveforms of stutterers when they are fluent and those of 
normal speakers. 

i 

Stutlerero when dysfluent . The second obSLTvatiori fpcn Uif? • data on 
stutterers was that when the st^tterer3 (whether mild or severe) were dysflu- 
ent (six of the eight subjects) voice initiation after a block was charart'-r- 
ized by a gradual Instead of abrupt buildup of the KG signal in all but oiJ 
of the subjects. Table 2 indicates the features observed. Two of the mild 
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Electroglottographic (EGG) and acoustic records at voice onset in 
the utterance "four" by a normal subject. The EGG wav€|^orni is dis- 
played with upward deflection indicating decreasing impedance. The 
EGG waveform is characterized by steep rise (a) in 'vocal fold con- 
tact' followed by slower 'opening' (b) and 'open phase' (c). Am- 
plitude of the first EGG_ pulses builds rapidly (d), compared with 
the acoustic waveform (e). 

ADAPTED SAMPLES FROM SEVERE STUTTeRERS AND THEIR COMTROLS 



Figure 2. Electroglottographic and acoustic waveforms of normal speakers on 
the left and of stutterers, when fluent, on the right. The top 
pair is for females; the bottom pair is for males. StrUtterers, 
when fluent, produce EGG waveforms that build rapidly in ao^piitude 
•like those of control subjects. 
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Table 2 



SuMKiry of cheracteri sties of the electron lot tog raphlc waveforms. AH sub- 
ihowed ripMiy ijiieir*e«Sing rol<S ooiitaet during eiich cycle (see Fig* 

ure Thus, this factor did not distinguish Dild from severe stutterers. 

After a block, severe stutterers tended to show jwre abruptly decreasing vooal 
fold contact (Figure l.b) and less stable open phase (Figure l.c) during the 
vibratory cycle than mild stutterers, although the normally gradual decrease 
in contact and open phase were restored when subjects were adapted. The 
iwroally abrupt envelope of the EGG signal (Figure l.d) was hot present after 
stuttering but reappear^ when adapted. The occasional pre-vo icing EGG cycle 
occurred for a few stutterers and a few controls and did not serve to distin- 
guish one group from another. 



CHARACTERtSTtCS OF THE ELECTROGLOTTOGRAPHtC WAVEFORMS 
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stutterers evidenced gradual buildup of the EGG signal iafter a block until 
a(^pted. In other rei^ects the waveforns resembled those of control subjects 
g^thougft the ci>«n phase for LB was brief. Severe sUtterers when dysfluent, 
hAMivsr, diffvrmi f^F*oi nornal 4h several rejects t a steieper decrease in vo* 
oal fold contaetp a less stable or prolonged open phase^^ and a more gradual 
buildup of the EGG signal,. These differences also disappeared upon adapta- 
tion. One of the severe stutterers initiated voicing with normal looking EGG 
whether dysfluent or fluent. The consonant /vowel ratio Iwas reversed in dura- 
tion, however. , During a dysfluent 3]*25, silence and ponsonant noise lasted 
^100 ms while voicing lasted 200 ms, in contrast with the fluent sample in 
which the ratio reversed to 1j2 with pause and consonant' time 200 ms and voic- 
ing 100 ms. The rest of the stutterers evidenced gradual buildup of EGG am- 
plitude to initiate voicing after a block (Figure 3). 

I 

This gradual rise in EGG anplitude Is a physiologilcal index of "easy on- 
set of voicing." It is a more reliable indicator than' the acoustic waveform, 
because the sound is often graded in rise time due to an increase in front 
cavity opening of the vocal tract and perhaps an incifease in volume velocity 
from subglottal air pressure. For an utterance such as four , the acoustic 
waveform typically shows a graded envelope as the oiral constriction for the 
/f/ opens for the vowel. Normally, as we hafe se^n, the EGG waveform is 
abrupt in the rise time of its envelope, indicating that speakers position 
their folds for voicing (not necessarily corapletely adducted) before the 
aerodynamic forces act upon the folds to set them jlnto vibration. The slew 
rise time in EGG shown by two of the raild and thred of the severe stutterers 
is abnormal and adaptive. It Is a strategy that sti^tterers apparently use to 
Initiate voicing when they are experiencing difficulty. The strong indication 
is that under these circumstances the aerodynamic foTces are brought into play 
during a gradual posturing of the vocal folds lor, voicing, resulting in the 
slow buildup of the EGG envelope seen in Figure 1 3. Furthermore, once the 
stutterers are adapted or "fluent," the EGG envelope is abrupt like that of , 
the control subjects. This styi*? of voice initiation docs not seen to be used ^ 
routinely by stutterers but rather as a method for preaking the block. > 

! 

Although the phenomenon of gradual EGG buildup w.is evident for both 
and severe stutterers, two charaotr'risti'.'s of the EGG waveforms were more com^ 
ewn among severe stutterers. Both the normally gradual decline in the signal 
corresponding to gradual decrease In vocal fold' contact as the folds peel 
apart and the normally stable open phase are lesfs prominent in the stuttered 
trials of the adaptation task. Figure H showd this change. The somew^l^l 
steeper decline in thf? EGG signal and the brief! open phase before they snap 
closed again aa seen in the top part of tne figure indicate th;^t the folds in 
the stutterer were more rigid than normal. Additional evidence of a change in 
stiffness is the corresponding decline in fundamental fr-equency of tf)e wave- 
forms when adapted. The vibration initiated after th>f block w.i:j 170 H/. 'com- 
pared with 11^ Hz upon adaptation. The bottom part of th*^' figurv- srjows the 
EGG activity for the control subject. 

Another' observation is the existence jf highly ritualized 
"break -the -block" behavior. One S'5v>^r<:? stutt^^rer- in o jr- Sdfi9;i*v <J'-T>'.)r!Str,jt*^<J 
3*«tage laryngeal maneuver to initiate voicing that looked similar across dif- 
ferent utterances. Figure 5 shows the EGG pattern," that aoct^mpanjcd the block 
and ,fin<^l breaking of the block for the utterances three [Orij in 3^2t» arid 
four [f^rj in ^2^3 . When adapted, this subject riad an f^ of Hz for th'^' 
onset of voicing in both utterances, but the first part of the 3-3tage ritual 
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Figure 3. 
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Records from five subjects showing EGG activity following a 
stuttering block. The more gradual build-up of the EGG envelope is 
characteristic of most of the stutterers when they initiate voicing 
after a block. 
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Voice initiation (ons<>t of LriJ in three) in a severe stutterer and 
his control. Top waveform Is the £00 after a stuttering block with 
the characteristic slow rise time. The relatively steep decrea^t- 
in vocal fold contact and brief open phase of the waveforjTJ is ac- 
cotnpanied by high fundawt*ntal frequency* The- middit; wavvfyrm 
the same utterance adapted to fluency wi th n 1 ight ly longer o\)*in 
phase, normal rlne time of the first few pulser^, and ri lower* fufid.i- 
mental frequency 4 The waveforsi at the botton of the figur*^ th*- 
EGG signal for the same utterance tjy the control subject. 
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used to break the block showed a nuch higher fundanental frequency* In V.e 
trials sh<Mm in the figure, the first stage had an f, ^70 Hz. it can also 
«««a tt»t a» tta fiOG »igBiil aboua lArgar i^edanoa ohaqges, ttm correspond- 
ing acoustic signal is gradually lowered in r» and finally aborted. The sec- 
ond stage is chapacterized by breathy low frequency vibrations wnose acoustic 
output; is again choked off as the in^edance changes widen in their excursions. 
The third stage is always fiuccessful in that voicing is initiated and main- 
tained, although it is abnoroally graded in its EGG envelope in contrast to 
the adapted saople seen in the middle part of Figured. Except for the graded 
EGG seen in the final stage, the rest of the break the block strategy seems 
inaladaptive, as voicing failed to be maintained. 



EOG AND ACOUSTIC WAVEFORMS FOR 'BREAK THE BLOCK' STRATECV 
SUBJECT JP 





'4^53' 











Figure 3. EGG (top) and acoustic (bottom) records associated with two 
stuttering blocks for onie'^subject . This idiosyncratic and 
ritualized strategy for initiating voicing after a block is similar 
despite differertces in utterance. This figure ^shows one second of 
tiiBe, but can be compared with the same subject in Figure 4, which 
shows 100 ms of voice initiation for 3^^5* 



The final observation from the EGG and aoou^tic data was the existence of 
a physiological tremor that shows up on t^he EGG signal during v^oiceless 
blocks. The laryngeal tremor is often phase-locked with an observable tremor 
In the lower lip. These tremors were observed in two of the severe stutter- 
ers. The subject (PA) represented on the top part of Figure 6 had a 9-Hz tre- 
mor and the subject (DL) on- the bottom ha^ a T^Hz tremor. These correspond 
with the lip tremors of T-^O Hz that Fibiier (197U recorded by EMG from the 
facial muscles of stutterers. Physiologic;/! tremor has been linked to height- 
ened stretch reflex due to increased gamma motoneuron activity (Lippold, 
197U- The data froai the second subject shows the 7^Hz lip treiaor superim* 
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Figure 6. Records of lower Up movement, EGG activity, and the acoustic sig- 
nals from two subjects. The top part of this figure shows^ a 9-Hz 
physiological tremor in both lip and larynx as the subject prolongs 
[rj in an effort to initiate voicing. The bottom part of the fig- 
ure shows several repetitions of [f] with lip lowering for each. 
Super in?> OS ©<1 upon these trials is a 7-Hz tremor in both lips and 
larynx* 
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posed upon a l.^-Hz trial frequency, as the subject repeated [f]/ Interesting 
to note here is the normal temporal coordination of the lip/jaw system with 
the laryngeal adduotory system for these repeated trials, even though stutter- 
ing is usually considered to be "uncoordinated." 

Voice Onset Time 

One index of the tefr4:>ordl coordination of laryngeal and supralaryri^eal 
behavior is the measure :'])€: nt of VOT (Lisker & Abramson, 196^) in syllables that 
consist of a stop and a vowel. Measurements of the time between the burst for 
/t/ and the. onset of the voicing for /u/ in the utterance two were made for 
all utterances, both stuttered and fluent, in the adaptation task. Figure 7 
Shows the results. Any utterance that showed aberrant laryngeal activity in 
the ' " recording was eliminat^^d from the "fluent" category. Thus, the 
per Uly and physiologically fluent utterances of the mild stutterers were 
wel in normal limits of VOT. Two of the control subjects had consider-^ 

ably ^u.,5.ir VOT than the others, with one having a mean VOT of 80 ms and the 
other 8^ ms. Th^^y glso ranked r^'^v^/nth ai.d eighth, r^^jpectivoly , in syllablt? 
rate. There was no significant correlation between VOT scores and rate among 
the normal speaking group as a whole (r^ ^ .^3), but extremely long VOT 
scores corresponded with the slowest rat^^s. The same finding held for the 
fluent utterances of the experi mt^nta 1 group. The C'^i-relat ion between VOT and 
rate was low and lacked sign lfir;=jn(^p (r^ .?7 ) , but it th*:^ extrt-Tx^r. tl-ifjr^.- 
was some corrt:^po:i<A'jns:t: in tnal lu^' oubjcjt w) In t.:.- :3!;ort^;.';i VOT (3^) iii::-' ndd 
the fastest speaking rate (when fUic^nt), while thf^ subject with the iong^';3t 
VOT (97 ms) had the slowest speaKing rat^. The severe stutterers in thio 
study had VOTs that varied depending on whether the block occurred on Vm: 
utter^ce two or elsewhere. If the block occurred elsewhere in thf> s.?rier> or 
four digits, VOT on /tu/ fell within normal lirrits, but if th^' mofnerit 
stuttering fell at the Junctive of the voiceless /t/ and the voioe^j /u/, tr.t :; 
VOT was either artificially shortened (as when the subject voiced tne : . or- 
it was extremely long ^ when vcic^nr became dirricult to irnti^te). 

These data do not suggest ar: overall dpficit in VOT aniorig st.i ^r.^^r^-r-;: .^.i- 
less they are stuttering. The OrariJ Mean for- all m^^iC^ir^.:', of VJT fur' i:* ' : 
subjects in the utterance /tu/ w-^^ ''7 ms with a standard d^'viati^n of 1/ , 
which corresponds closely with the rr.ean VOT of the pooled fluent utlf-r-iUPf^r, f 
stutterers of 56 ms with a stanu^^rn deviation of 19 ms. 



Wlien dysfluent, it is in voice initiation that stuttervr.^ sult'er^'^i 
oarticu-lar difficulty. Dif f icult ler, were manifest*>d in silent Mo<;k.v, re;;^'t'- 
tions of the voiceless consonant p.^eceding voicing, or iihorl bur^tt} A vOi'■if]^^ 
that were improperly initiated and were not maintained, AfttT- a ,>t.j ttr-r ir^^^ 
block, the most successful strategy for voice initiation "e<i;;y onjr-t /t* 

voicing^ evidenced by gradual growth of * the EGG envelope. After repeated tri- 
als, however, the adapted fluent san^les were initiated with abrupt KGG 
envelopes siflrtlsr to those of the control speakers. VOT measuft^d from th^- 
fluent utterances of the stutterers did not significantly differ ffom th;i^ r)r 
the controls. Taken together, tr>^ res-jitfs *jf t{\'^> VOT in^Jy^is an^i th^- 
observationd of EGG and acou^tin waveforms Indic^jte that, wh^n fluent, 
stutterers initiate voicing normally. 
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Figure 7. Voice onset time (VUT; as moasur^id frorr^ sound spec tro^frims of the 



fluent utterances LtuJ wf the 3tutt^^rer3 during the acja;aation task 
and tnosc cf the control suDjects. Each of the y control subjects 
yielde j 10 samples of ' tu J . The me. in VOT for each subject 13 nott^d* 
to the right c;f t'ich histogram with standard deviations irj 
parentheses. VOX mea^-^ures for the fluent utterances of stutt^r-rers 
were similar to those for normal speakers. For sorDe of the severe 
stutterers fewer tn^r^^ 10 fluent sampler, wore obt'unt'd sinc^' 
perceptually fluent samples were omitted when abnorovil fluctuations 
in lary ngea 1 impedance preceded voicing. 



Although stuttering may reasonanly be thougnt lo a disorder of tirnirig, 
the obvioMi) Virr^'Ordl irrcgu iar i t i-.s ';abnk.^rmal VjT, rt-p^-t 1 1 ions, ^nd prolon>^.i' 
tions of sound or silence) jnay emerge from i probir-m that has mor^ to do with 
improper l^^v^-ls of activity thar. iT;>roper timin^:. The abnorma 1 1 1 iei5 uf motof 
coordination seen in stuttering my not be at essoncf a problem: in temporal 
coordination but r^^thr.r a probl^^^nn trie lev-^L-^ ^-u^n^d irwit. vi .*':tivity of tn^* 
many muscles cooperating for a particular fun';tion, .^U(?h as thosf» th.-it s^t th*.; 
position and tennion uf tn«^ vr;<Ml fr)^'l3, 

Evidence for th is th^'ory li^^s (.)ri on^^ harri with tru^ pr»-v i rrjsly riol '»d 
abnorma 1 ly h igh f ^ .s at ingj in th*- aboraed vo ir> in)/ tr ia is of sorn-' of th^^ 
stuttering i:v ..i^-; , tar' 1^5-, gndu il :)pening pha:;- md :3t, iM. '^■\i,i>:r 

of the rapid vocal fold vibrations, all of tf-Mr-se factort> intncating abnormal 
stlffnCGG, and on the other hand in the abnormally slow but f.-xtr^^me impf-ddru-M 
changes during some of the 5tuttering episodes indicating wide poatural excur- 
sions of either too much adduction or too much abduction to permit successful 
voice initiation. Along with evidence that the rvitting^i for th*^^po??tur I and 
tvfir,:on prerequisites to voice initiation may be aberraiit i/i '^ijlti^vihg, thti-t. 
is evidence that some temporal coord inatiofj i.^3 ma intainr-a, I;,, is tr'ut-^ that 
VOX as measured in the acoustic signal is ^\::y.)nml during a ntuttering Dloc^,- 
but the laryngeal and supralaryngeal system;^ involved show a remarkable degree 
of teinporal coordinatton in their »oveaents.' The prutiact, trie sounds is tem- 
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porally disorganized due to difficulty in initiating voicing, but the prepara- 
tory adjuataenttf are time-locked, and in this sense are well "coord inatedi " 
The physiological tremors seen in the laryr«geal and lip-jaw records from two 
of our subjects agree in frequency and tend to be time-locked, and* the trials 
or repetitions demonstrate remarkable temporal bonding of the i;wo systems. It 
may be that the timing of laryngeal'-supra laryngeal coordination is not the pa- 
rameter at fault in stutterers, rather it may be that levels of the laryngeal 
activity previous to voice onset or offset are faulty. Ziramermann and Hanley 
(I983) suggest that in adaptation, background muscle activity in stutterers 
becomes stabilized as arousal decreases. 

When fluent, stutterers yielded VOTs well within normal limits. Reasons 
that this study found no significant difference while other studies have found 
longer VOT in fluent utterances of stutterers than in controls may be (1) that 
the present study used physiological criteria as w^l as perceptual Judgments 
to categorize an utterance as "fluent" and (2) that repeating utterances until 
fluent (adaptation) may be a more reliable method of detaining a fluent sample 
than picking "fluent" sarrples out of a corpus of stuttered and fluent speech, 

Jhe first report on this experiment (Borden, 1983) suggested that 
stutterers when they are fluent are similar to their controi:> in initiating 
speech. However, in executing a speech task, severe stutterers had a'signif- 
icantly lower speech '-at^ than controls. This finding indicates that severe 
stutterers may require more time to make tne ongoing adjustments and transi- 
tions required in speaking fluently. The present study adds support to the 
first report in that voice initiation seems normal as observed in 
elec troglottograhic waveforms and as VOT measured from spectrograms when 
stutterers are speaking fluently. When stutterers are dysfluent, however, the 
folds may not move (the subject with the reversed CV durations), they may go 
into tremc^r, or they may exhibit ritualistic patterns involving wide excur- 
sions. When voicing is finally initiated successfully after a stuttering 
block. It l3 usually by a str;it^'gy involving 1 gPidual, growth of vibratory am- 
plitude. 

ThestL' data provide an enpirical basis for the use of **ea3y onset of voic-- 
ing" techniques in ther.any for stutterers. Our observations lead us to cau- 
tion, however, that ^sy ont^et my be revealed more reliably from 
eloc troglottographic information than from acoustic waveforms. To the degree 
that stutterers do initiate voicing normally when fluent, as indicated by the 
dali in this study, anotht;r implication for therapy might be that stutterers 
m.iy profit from enhanc ing^ the ir kinesthetic sense of prevoicing st'ttings when 
f luont and try to recapture that sensc' wrien they are hav ing difficulty in 
voice init iat ion. 

Re fert?nceL5 

Adams, M. R., k Hayden, P. (1976 ). The ability of stutterers and nonst. ^r- 
ers to initiate and terminate phonatlon during production of an is^uc.^ed 
vowel. Journal of Speech and - Hearing Research , 1"^, 290-?96. 

Adaws, M. R., h Reis, R. (197^). Influence of the onset of phonatlon on the 
froquency of stuttering; A replication and rec^vaiuation. Journal of 
St>eech and Hearing Research , 17 , Ib^-'Ib^, 

Baer, T., LOfqvist, A,, & McGarr, N. S. (1983). Laryngeal vibrations: A 
comparison between high-speed fiimtng arid glottographic techniques- 
Journal of the Acoustical Society of Aiuefica, 73, t30i»-no8. 



Borden et al.: Onaet of Voicing in Stuttered and Fluent Utterances 



Bioodstein^ 0. (1975). A handbook of gtutterift^ (Rev. ed. ). Chicago? Na- 
tional Eaater Seal S^iety for Crippled (3blldren and Adults- 
Borden, C. J. (1983 )• Initiation veraus execution time during manual and 

^ oral counting by stutterers. Jo urnal of Speech and Hearing Research, 26, 

' 389-396. 

Childers, D. G. , Naik, J. M. , Larar, J. N., Krishnamurthy , A. K. , & Moore, 

G. P. 0983, May), Electroglottography , speech, and ultra-high speed 
cinematography . Proceedings of the International Conference on Physiolo- 
gy and Biophysics of Voice. 

Conture, E. G, , McCall, C. N . , *3rewer, D. W. (1977). Laryngeal behavior 
during stuttering. Journal of Speech and Hearing Research , 20, 661 --668. 

Cross, D. E., & Luper', H. L. (1979). Voice reaction time of stuttering and 
non-stutter ing children and adults. Journal of Fluency Disorders , j[, 
59-77- 

DeomedL, J. E. (Ed. ) (1973). Physiological tremor , pathological tremors and 

clonus , Basel : S . Karger. 
Fibiger, S. (1971). Stuttering explained as a physiological tremor. Quar- 

torly Progress and Status Report (Speech Transmission Laboratory, Royal 
Institute of Technology, Department of Speech Communication), 1-23- 
Fourcin, A. (197^). Laryngogrdph examination of vocal fold vibration. In 

H. /. , -:e (Ed, ), Ventilatory and phonatory control mechan j^sm (pp. 315-333 )• 
Londvjn: Oxford University Pross, 

Fr^'omui, F. J., 4 ^Jshijinv'i, T. (1973). Laryngeal muscle activity during 
stuttering. Jou rnal of Speecri and Hearing Research, 338'*36;?. 

Hil^man, H. E., 4 GliDeri, H. (V977J. Voice onset time for voiceless stop 
consonants in the fluent reading of stutter^ers and nonstu tterers. Jour - 
nal of the Acou3ticar Society of Ajrjerica, 6j_, 610-611. 

Lipp:;ld, 0. C, J. (1971). Physiological tremor. Scientific American , 23^ , 
6^^-73- 

Lisker, L,, 4 Abramson, A, i^JbH). A croi^s-language study of voicing in ini^ 

ti'il stops: Aooustic^il measurements. Word , 20 , 38^*^22* 
M*-^/. , D. K., Cofiltirv, K. G. , % Ciruso, A. (1 979 ). Voice onset time, frica- 

tion, and aspifation during stuttereri3' fluent speech. Jo urnal of Speech 

and Hearing K£search, 22, 6^9-6!>6. 
Rxi^^y, G. (1972;. A stuttering suv^-rity instrurnr^nt for chlinren and adults. 

Journal of l^peech and Hearing R esearch , Hf^^^^^"^^'^''^^' 
Net n'"'nberg, M. (1981). '^ook; relations between glott'^1 air flow and vocal 

fold contrif't apHri, Ir, I.., Ludlow h M. (J. Hart (Eds,), Proce'edings of 
"^o^^f ^-'"eric^. ^ 211 A f^s^?3Stnent of Voc-31 Pathology ^ As to Reports , 1 1 

Rytn, ^ff. (197^;* Programmed the ra py for s tuttering in chl Idren and adults . 

:;pringf ip Id , IL: r.harlpr, Thomas, 
r.t irk we i^h'.?r, C. W. > HirGchnvin, ^ T'lnrumh lum, H. .S . (1976 ). !>atpncy of 

vocM I izat ion -..nset: Stuttert'ry versus non.^l;i tter(?ri5. Jo urnal of S pee ch 

and li ' ar ing Research, J_9, ^-^9.^. 
Vif^^Klper, C. (19^:3). Sp^^nch correct ion : f^r in^i pleo nnjl nirithodr; r>n, ^ 

'\F:nglewo'jJ CUrftj. N^; Prerit ice-Ha 11 . 
Watson, B^^t., h Alfonso, P. J. (198^). Forepe^iod and st'Jtt*>ring severity 

effects on ^iooustif) Irnryn^f-^il rt^actir;n time, ^ Journal of Fluency Disord - 
^ ers, 8, 183-206. 

WiCbster, R. (197^^;. A fc^1havi;;ral :i::;viyGtrv of ct;j tt^:r ing : Trealnvint and 

: tiieory. In K, KHiiuvi^i, 'fi . K. Ad+m.s, ^ K* Mil^'h^Mi (K<\s*)t 
^ ^nnovat I ve treatment methods In psychopathology , New York; Wiley, 

Weiner, A. (1978)* Vocal therapy for stu ttpr».'r3 : A trial pr^y^r-im. .J o - , 
' nal pr^ Fluency PlsorderS i 




Borden et al.: Onset of Voicing in Stuttered and Fluent Utterances 



Wlngat«, E. (!969). Sound pattern in artificial fluency. Journal of 

Speech iwd Hearing Researoh» j2, 677*686. 
Zlmmeraann, G. N., & Hanley, J. {1983). A cine fluorographic investigation 

of repeated fluent productions of stutterers in an adaptation procedure^ 

Journal of Speech and Hearing Research , 26, 35--^2, ' 

Footnote 



*The main aim of the overall experiment^ from which this paper is the 
second report, was to examine the interaction of respiratory, laryngeal, and 
supra laryngeal movements of stutterers and their controls during speech. The 
first repcrt (Borden, 1983) focused on initiation tin» and execution time for 
speech and manual counting tasks. The present report fojcuses on voice onset, 
and the third report will address coordination. ^ 
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PHONETIC INFORMATION IS INTEGRATED ACROSS INTERVCN^G NONLINGUISTIC SOUNDS 
D, H, Whalen and Arthur^. Samuelt 



Abstract , When the fricative noise of a fricative-vowel syllable is 
replaced by a noise from, a different vocalic context, listeners 
experience delays in identifying "both the fricative and the vowel 
(i^alen, 198^): Mismatching the information in the fricative noise 
for vowel and consonant identity with the information in the vocalic 
segment appears to hamper processing. This effect was argued to be 
due to phonetic integration of the information relevant to categori- 
zation. The present study was intended to eliminate an alternative 
explanation based on acoustic discontinuities. Noises apd vowels 
were again cross-spliced, but, in addition, the first 60 ms of- the 
vocalic segment (which comprised the consonant-vowel transitions) 
either had a nonlinguistic noise added to it or was replaced by thai 
noise. The fricative noise and the majority of the vocalic segment 
were left intact, and both were quite identifiable. Mismatched con- 
sonant information caused delays both for original stimuli and for 
ones with thp noise added to the transitions. Mismatched vowel 
information caused delays for all stimuli, both originals and ones 
with the noise. Additionally, syllables with a portion replaced by 
noiae took longer to identify than those that had the noise added to 
them. When asked explicitly to tell the added versions from the re- 
placed, subjects were i>nable to do so. The results indicate that 
listeners integrate all relevant information even across a 
nonlinguistic noise. Replacing the signal completely delayed ident- 
ificationij more than adciing the noi3t> to the original signal. Thio 
was true despite the fact that thA subjects were not aware of any 
difference. 

Phonetic information is spread throughout th*-* acr .stic signal. This is 
true even in the case of fricative-vowel syllables, where it might seem that 
there are two invariant cues. In such syU^iMes, there are two distinct 
acoustic segments: noij- that can be identified in isolation ao tne 
fricative, and a vocalic segment that can independently specify, the vowel. 
Nonetheless, tm^rp ij vnw»l information in the fricative noise (Whal^-n, 1983; 
Yenl-Komshian h :>oli, ',96' j , and fricative information in the vcM>alic furinani 
transitions (Harris, iy»,>i; Mann ^ Repp, I960; ^hii^^n, 19H1). Thur. on*. r,f 
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the most promising cases of context-independent phonetic cues turns out to 
vary contextually. 

There Is also evidence from cross-splicing studies, however, that 
listeners cdn detect information specifying the original context of the noise 
and of the .vocalic segment. A series of reaction time studies (Whalen, 198^) 
indicated that subjects are sensitive to all the information in the ,si liable* 
In that work, listen^s were presented with edited fricative-vowel stimuli 
containing mismatches between information in the fric^ive noise and Informa-- 
tion in the vocalic segment. Listeners were slower to identify both the con- 
sonants and the vowels of the syllables with mismatches, suggesting an attempt 
to integrate that information, even though the information was not necessary 
to identify the phones^ This was true whether the ^mismatch was between infor- 
mation about place of articulation in the transitions and in the noise, or be-- 
tween vowel information in the noise and in the vocalic segment itself. It 
was also true whether the subjects were identifying the vowel or the 
fr icatl ve* 

The present experiments were designed to clarify the interpretation of 
that work. In particular, there was a possibility that some relatively 
uninteresting psychoacoustic discontinuity in the previous stimuli accounted 
for the reaction time data. That is, since the stimuli were (digitally) edit- 
ed, there could have been abrupt changes in the spectrum at the cut, possibly 
causing a purely auditory di5ruption of pruceiissi ng. This possibility was less 
likely in one experiment (Whalen, 198^1, Experiment 5) in which, even though 
the fricative noise was separated from the vocalic segment by 60 ms of silence 
(thus distancing the two spliced portions), the delay caused by mismatching 
information remained. However, it is conceivable that the inserted silence 
failed t*) displace an auditory trace of the fricative noise. If this trace 
did not match the vocalic segment, subjects could have perceived a 
discontinuity. Thus the dat^^. do not completely rule out an auditory 
i'liscont inui ty ac^x;unt of tre rfMf?tion tim^^ results, 

Tne present experiments attempt to replicate the slowing effect of 
mi r,jn;jtc;h,p:> in ca.ses where it In cle^ir that an auditory discontinuity account 
cannot hold. To that end. the tr^niporal progression of the syllable was left 
intact (that is, no silence wab introduced), but the location of the digital 
splice coincided with the imposition of i nonlinguistic noise. This noise 
•'•itn^r 3 natiir^illy produced cough or a synthesized buzz) occurred during the 
vocalic formant tranaitionn, oom^irising the first 60 ms of tne vocalic aeg- 
»Tient. rr the previously oL^iined delays were mere auditory distractions, th^^n 
tht^ mi ^.fTiritcr; e^rf(n;t.^3 shouM disappear — the auditory dioturbance at the bound- 
ariMi'. thr nol;v.^ .shnuid tv' t hr aam-^' for syllables with matcrv-1 and wi^h mis- 
:r.il fru jalive ncisc:} an! v«).\ilic sSegments. ' If, howev^^r, listeners do in 

fict iraegrat'-> information acr^v:>3 the whole syllable, then th'^ effect should 



Experiment 1 

hxperi:nent 1 exami ne.1 a mismatch of i nf ormat ion for fricative place of 
arti'-julatlon, betwet^n t/ie i nr(>rT,,j f. i in th*^ voc^^al ic formant transitions and 
tnat in the noise Itaeif • We will call this a mismatch of consonant informa- 
tion, even though the transitions (as the name implies) provide information 
about both the consonant and the vcwel. The nonlinguistic noise (the natural 
ccu^ or the synthetic buzz) was introduced ir* one of two ways- F'or both 
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■atched and miMatched versions, the 60 ffls of the vocalic segment which 
y_,^^^ tityte4 tt» tranaltlmia eitUg* had thm nonlli^i^tlc no4«« digitally added 
<tf» *added^'-B'tliBuii), or were replacecT'by the nbniinguistlc noise (the "re- 
placed" stiHiuli). The added jiolse was expected to sask the transitions some- 
iihat, presuaably reducing the effect of mismatched information if a mere audi- 
tory distraction was the cause. ' However, if the more global, phonetic 
Interpretation is correct, the mismatch should be Just as strong when tiiere is 
noise added to the signal as when the mismatch is the only complicating fac- 
tor. The replaced stimuli, however, would not have transitions present, and 
therefore -should show no effect of the cross-splicing. 

Two different noises were used to reduce the possibility of some 
unexpected acoustic artifact. We wanted syllables to be perceived as 
Interrupted in a way that allowed what might be called "phonetic" restoration 
(after Warren's, 1970, phonemic restoration)^ That is, listeners should be 
abie to assume that there was a signal behind^ the noise, even in the replaced 
stimuli. Both noises were primarily aperiodic but with some periodic shaping,, 
a combination most likely to produce phonemic restoration (Samuel, 198lb). 

Procedure 

Natural tokens of the syllables [sa], [/a J, [su], and [/u] were recorded 
by a male speaker of English. (The speakor was not the aarae as In Whalen, 
198'*.) The tokens were digitized (20 kHz sampling rate, 9.6 kHz low-pass fil- 
tered), and test items were selecteo so that: 

1. All fricative noises were of the' same duration (160 ras). 

2. All vocalic segments were of the same durat ion . (3^40 ms), 

3. Each syllable token was used either for its fricative nols^ or for 
its vocalic segment— thus every test syllable had an electronic 
splice in it. 

Two tokt-ns of each category (e.g., the [s] from [sa]) were used. 

Two different nonl inguistic noises were used. One was 60 ms of a natual- 
ly produoe.J cough, while th»^ other wis 60 ms of a buzz consisting of a sp- 
iDi-porioriiu filtering of white noise with peaks at intervals of 300 H?.. 

f-'ivf copies of each digitized sy liable were made. One of these (the 
"original") was intact except for the digital splice between the friftative 
noise and the vocalic segment (aa described above). Two "added" and two "re- 
placed" versions were constructed; In tht' "added"' vnrsionc-, the cough noioc 
or buzz noise was added digitally to the first 60 ms of the vocalic segment. 
In the "replaced" versions, the first 60 ms of th^ vocalic segment i^ere 
-. — Go mpl qtly r g p l a cad by the - oough or -buaz-. 

For all -three typos of stimuli (f'or iginai," "added," and "replaced"), 
ha.lf of the syllables had vocalic segnfents matched with tne fricative noiso 
(e.g., the [u] from [su] paired with /an [si nci^e) and half h-i.i mi smatche'i 
. <m«s-^».f the- {«) from f/uj paired vlth an fs} noise). Note that when the 
nonlinguistio noise replaced the first 60 ms cf the vocalic segment, there was 
very little left to be mismatched. That Is, even though the rest ofthe vo- 
3«g«ent cam fron an inappropriate syllable, the transit ion^^Bte, by 
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design^ ©o^iy completed by 60 ms. Thus there should not have been much of a 
pii0si^ti^ J^aaton in Uw ^r«{^laoed^ atii«ili« The fiifst ooluftn of Table 1 
shows the construction of the matched stimuli, while the second columr, shows 
the conlructlon of tne mismatched stimuli. The match/mismatch factor, the 
five noise conditions {original; added and replaced for two types of noise), 
and the two tokens of the four fricative and vowel categories result n ighty 
stimuli. 



Table 1 

ConstruN'tion of th^ Stimuli 



Syllable Matched Consonant Vowel 

Heard (Exp 1 & 2) Mismatch Mismatch 

i^S; , (Exp 1 ) (Exp 2) 

noise voe. noise voc, ncise voc, 

"sa" sLa] ^ [s]a . s[a] ^ [/Ja sLuJ ^ [sja 

"/a" /LaJ ^ L/Ja /[a] ^ [sJa flu] ^ [/Ja 

"su" .s[u] ^ [$]u sLuj > [;ju sTa] ^ [bju 

; * 1 / ^- ■ [Uj ^ [ fid] * [ J li^ 



Note: Each coIuitji presents the syllables used to construct the simulus syll^ 
ables, Tne portion of each syllable enclosed in bracKets w.ii^ digitally 
excised. 



In each of two r^orv.U t i on 3 , oubje^.^tr; heard r.indomi.::cd ne^u'^n.^'*^:,: containing 
five rcpct 1 1 i Ofiii ;r olimuiu^ i;vcr heav1pncno:3 ^ Ynv i nlr-r-ot irnj i j:; int'Vr- 

val was /SOU f^a. Subje<)ts were asked, in one condition, to identify Iho vowol 
(ft^fi fluffs .^ji^.^iy posGlDlo, In tht'^otner condition, they were asked 

to identify the corioonanl C'l:" or "sM**) as quickly a:3 pooSihle. The oro^-r of 
theac cufiditiun^ wa:3 DaIafic*"J across subject;i, as was the dcterfDi nation of 
which buttbri y^is pu:>h^^d by the dcMninant hand, Hesponi5e:3 undt-^r 100 m:*. were- 
''H'nintod as mistikf-s, and in*' ^^-^tj i ;''m^nt wa:^ {'orvefl to give uf' waiting far. an 
an;3wer after J'^i.^J nx.. f^msift^ responses af:a jnirri.ako:'. lu identification 
a'Vjourjt'.'d for 'j.Oi of liy -"r^i^.-i.ca judgment., an.'l cf the v..W' I j ud/.',rn^':i to . 

Th*^Sv tri Uf^ w-\^ ' not ifi-^lu ^^ l in r'-acti"ft li.T/r .in<i 1 y e^, . 

Tri^ liiiC/ j'^-.rt J w^'f'f' t.'^^ Y-ii^ :;jt.ijd*^:it3 wfjv w-r'v paid I'^^r Ifi* if ^'.ir'V. . -.j-r ': 
riU. :v»' 'ar>.T':3 of Kfigliiih no r'^'porl^d r,'Mr;rih^ di f f 1 

Hirf'ovixt^ arid i> i <L> L-* L*«^ «^ 1 sj * * 



Figure* 1 shows thp r^''i(?tion tin^\*3 in Kxperiment 1 for th^ first two fac^ 
t'.-ro .-^f interei^U Jv^'rali, mi3matches of con30fja:jt inforrnati^. seen in 

th^ V-^ft p'^iJ' *-'f t>urj. slowed id^.-ntificat Lonj <i .w^5!iiri»;^nt . m.,^ F{l,f*^^> ^ 

?J.T9, £ < .0^1, a,; if. s^-'erj in the three bars to th^-'^right. Addir^g thf^ fjuiotf 
ca»j'yf*d ari H fTs:; d^-'l-iy, ind rej^^lac in>5 t h?* nLU:3*- •.'aas'^ i ir': .iidill<jnai rr;;? de- 
lay. 
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Figure 2 dhows the Interaction of consonant inforaation mismatoh and 
extraneoiM miJit* in «acn pa4r of tti^ ^ iMKWA th^-sean r«adtioti 

time to stimuli with matched consonant Infoihoatlon. The cross-hatched bar 
shows the responses to stimuli with mlsm^itched consonant information. The 
most important result Is apparent In the middle Vair of bars. Even though 
both matched and mismatched stimuli Includjb acoustic discontinuities (in the 
form of the nonllnguistic noises), the misma-'tch is still robust, F(r,'l9) - 
22.32» £ < .001, for just the "added" stiniull. The comparison of these bars 
with the two leftmost shows that the addition of the noise slowed Judgments an 
average of 8 ros: the mismatch of transitions added 2^ ras whether the noise 
was present or not. 

As can be seen from the rightmost pair o/ bars, and from the plot of the 
differences between bars on the right, the difference between matched and mis- 
matched stimuli is negligible in the replaced stimuli (a nonsignificant 
difference of 2 ms). Not only is there art Interaction between added/replaced 
and match/mismatch, F(1,19) - 12.76, £ < .01, but a separate analysis of the 
replaced data alone shows no effect of mismatch, F(1,19> - 0.32, n.s. As 
predicted, there is not enough transitional information left after 60 ms for a 
mismatch to be detected. 

The. effect of mismatch was the same whether the consonant or the vowel 
wij identified, F(1,)9) = 1.97, n.s., for the interaction, Reactioti times did 
not Vary du^^ to tht> type of nonlinguistic noise, F(1,19) - 0.6l, n.s., nor did 
type of noisf .interact with any other factors. 

Tho previously obtained slowing of r^eaction time with mismatched informa- 
tion was found even when explicitly nonllnguistic (in a sense,- purely audi- 
tory) dincontinuities were present. The effect on identification was not 
weakened by any masking of the transitions that might have occurred: The 
phonotic relnvmce of the transitions was still perceived. It is still 
(.•onc».>iv iblo Lnat there are two auditory discontinuities at work (the transi- 
tions and the nonlinguistic noises) and that they do not interfere with each 
uth'T. Expprimfnit 2 examines a situation where this Interpretation is not 
pUo:}ibiy. 

Nolf' that the identification times for the replaced stimuli are essen- 
tially ♦he name as for the mismatched added stimuli (sec Figure 2): The delay 
caused by .J mismatch is the same as the delay caused by the absence of the 
original signal. One interpretation of this is that appropriate transitions 
fjoilitate i<Jentlf ic.ition, -ind that mi ^matchf?d transitions are no worse than 
having riO transitions at all. Al terr^tively , the similarity in mean reaction 
times miKht be coincidental. Experiment providos an opportunity to test 
tht'so alternatives while examining the effect oV mismatching vowel informa- 
tion. 

Exp>jrimenl 2 

• Kxperiment ? mismatched the vowel information in the fricative noises 
with that of the vocalic segm^^nt. f-lanipulations similar to those of Experi- 
ment 1 were carried out, but with a Jiff went expec tuition: Misaatchea of 
phonetic information should show up even in the replaced stimuli. This is 
based on the fact that the vowel mismatch does not depend just on the first 60 
ms of the vocalic segment, but is instead present throughout Uie noise, on the 
ona hand, arid tha vocalic s^nent, on the other. 
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Procedure 

* . 

The syllable pieces of Experiment t were again used in Experiment 2, al- 
though the canbinations for the mismatched stimuli were different. The 
matched stimuli were identical (see Column 1 in Table 1). The mismatched 
syllables are outlined in the t tird column of Table 1. The transitions were 
always appropriate to the fricative, i.e., the consonant information was 
matched. The same five noise conditions as in Experiment 1 were used in 
Experiment 2: no noise, cough or buzz added to the first 60 ms of the Vocalic 
segment, or cough or buzz replacing those 60 ms. 

The stimuli were presented as before, with the two conditions of conso- 
nant identification and vowel identification, each presented as a separate 
block. Missing responses and mistakes in identification accounted for H.f^ of 
the consonant Judgments and 3.1* of the vowel Judgments. These trials were 
excluded from further analysis. 

The subjects were 20 Yale r^udents who were paid for their participation. 
All were native speakers of English with no reported hearing difficulties. 
Half had participated in Experiment 1. 

Results and Discussion 

Figure 3 presentd the results of mismatching vowel information and for 

including noise in the stimuli. The two bars at the left indicate that 

mismatching vowel information had a significant slowing effect of 2^ ms, 
F<1,19) - 46.90, p < .001. The three bars on the right indicate that adding 

noise slowed judgments by 29 ms, while replacing part of the syllable with 

noise slowed judgments by an additional lb ms, £(^,76) = 29.73, P < .001. All 
throe of these categories were significantly different from each~other. 

Kigur- -i snows the ref^ulty by both match and noise condition. In each 
pair of oars, thr opcr. bar shows the mean reaction time to stimuli with 
matched vowel information. The cross-hatched bar shows the responses to sti- 
muli with mismatched vowel information. Unlike Experiment 1, vowel mismatches 
cau.opj 4'' lays in each case; the effect of mismatches did not differ across 
these conditions, £(4,76) - 0.?7, n.s. If anything, these delays increased 
with t|e presence of noise, as is shown by the plot on the right. This plot 
'jV.owLi J^e differences between the matched and mismatched stimuli for the no 
noise, noise added and noise replaced stimuli rcr-pect i vely from left to right. 

Tfiere was one interaction between th<' match/mismatch factor and the cate- 
gory identified (consonant or vowel). The mismatch effect was approximately 
twi.je aa lirge when the vowel war, identifier! (1^. ms for consonant identifica- 
tion, 31 for vowel, F(1,19) - 6.73, p < .Oi;.' A separate analysis of the con- 
sonant identification data- alone shows that the eff'^et of mismatcf was still 
significant, F(t,19) - 9.57, £ < .01. 

The main effect of noise type was not signlficaht , F{1,,19) = 1.1?, n.s., 
nor did it enter into any_ sigrrif icant interactions. 

As In Wialen (196<<), mismatching the rather weak vowel information in the 
fVicative noise with the more powerful information in the vocalic segment 
slowed phonetic Judgments. Even though a nonlinguistic noise indicated that 
the signal had be«n corrupted, listeners were still affected by misoatchea be- 
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^Kw«n two tempor-iliy ;;uparat:ocJ poriiof;;> of tfiu ^utterance* The present experi- 
aeiit la parttoulariy Interest ing bwduae the information critical to the mis- 
oatch was not removc^l when nonl irigiiir>t ic noise replaced the transitions 
(as it was in txperimrnt 1), Tri^- '»t»tp i iced'* stimuli in Experiment 2 
demonstrated that cjven ^nen ... . cckrr.r, include significant acoustic 
discontinuities, the li.'rru;. * I ^r; 3 jt- t.^^ :TiLsrr,.itching phonetic information per- 
sists; The id^nt irr-'H i')fi <i»m ly-. .r - itj/. to ir; impairment of the pi ocess that 
integrates rel^'V^ua irtt'nrm^ii'>u , *\ r-^ ^vty simple distractions caused by au- 
ditory discontin ii*. A- : . 



One dif f rrt\':;'(; :;etwc< :. vr.' 
it took for the ['M()r'>'i i^' 
in P^xporimr^ra ^^ i-: ^a;*' ,- * 
fore plus tn- t i. * wr w: -a;-.' 
both experimcnii: t..s 
Experiment way f-ii^iM-, h '/f 
Experiment facl;:- i.. w i ? *^ . , • 
effect of ".v • i: • 

not m Expt-ririt . I:, " 
abl^ mismc-it^'h^' r T : ' • ' i - 
incr-'^-'aoe of (-"fif 1 . • * i r./ ' r 

f icat icnr* ^ .J « i ; . • ^ 



'1 rr<n^ 



13 the absolute amo 4nt of time 
wore, on the whole, 6fl ms slower 

fi'ji.i ( wi t h ; the fdCtC'V^ ui^ed be- 
len subjects who participated in 
,i rfc;al one; the effect of 
. The orlly interaction of the 
wp. ich was expected , since the 
I : v^f'iiions in Kxporimont 1 out 
M)t Of the stimuli had detect- 
in the second, bO$ did. This 
: iH^I in mor^? cwti otjs ad'^nt i^ 



Tno f i;-! : r .i' ' • 
and the aduiti.r. 
choose betw^:'e.M t ^ 
that experiment, 
cation , or r ^ ^ * - - • 
cation* Tne .-3 1 ^ i . r- 
informatior: t ; t': 
biliti^^-; ---- . 
ident ifi ; i\ , : r ^\ * 
indicate i^i t 
Interfer ; ^^ ^ t r - ^ - 



•iu it. M;^':> Of' v';wol 1 nf orm.i t Ion 
, .vere independent allows us to 
' rt.';;ulti5 of Experiment 1. in 
transitions slowed identifi*- 

1 r:r^rT.at ion :;;v/(>dc'j ld^^^ltl^i'- 
3 for syllable's with mismatched 
a' r.r.insi t ionr. left both po:53i" 
, T i ;'n'it^'n^"1 i n format ion r.l owed 

',.r*-i:' -rA. .>n n.^t,, Tnt-;-*: rxo-ilt:^ 
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whether th'- ^\.^ri. . o/. ' ' *: In n'»t^l, ^'^'p I .^re^i stimuli prod«ji;^*<1 sig-^ 
nificantly ^iry^^^r r-^-i^-t ; r.iT.^' , p^i'' tl^'d r>ti^.u;i, On^ posnibl** ox;.ilinat ir^n 
for this effect i-') tr,.it. t •^ "e; l.ic*»'.1 it.'Tn:'. li^'v hi* ipd '.i:; intt^rruptod or 
disc^ont inuoun c-in-l mat trii>' 'l l i t:-. rn- J^-'t?'. ^^r;'v,jK';M t---, .--.low t.rv^n 'l.owr;. 
A more likely explanation^ 'hit phoru/ti ; ifjt';grat iun w^'.-ur:; acrot^:) th^j 

noi.se, is that Uv^ percept j.ii r/j^}*:^-^ '•<;*;';l.s t ; vm :;i;/n.i; ^■•v.^r; whon 

nonl inguist ic noises arf^ prpr,r»nt, md th'^t percpptunl procesr^lng i". 3iowpc! 
when this expectation i3 not net, iixperimerit 3 tti::t:^ whether there are 
noticeable differences b<5tween add'=»d and replaced stimuli that would support 
the "distracting" hypothesis. The t^st involves explicitly asking tne sub- 
jects to discriminate between added md replaced stimuli. If tho r>ijhjpotr> are 
{)eing distracted by the replacement pf the the;; ald^;:! and roplav-.cr: 

stimuli should be d iocrimi nable. 
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ProoedMre 

The stimuli wer«^ "he "added" and "replaced" items used in the first two 
ekperin^ts. Ninec x tokens were used in Experiment 3, representing the 
crossing of four facto, j; (1) buzz versus cough as extraneous noise, (2) ad- 
ded versus replaced, ^3) matched, consonant mismatched, or vowel mismatched, 
and (1) toKens. The la.- , factor, tokens, represents the eight examples wit^^in 
each cell of the design, and includes two instances each of /sa/, /fa/, /su/, 
and //u/. 

The stimuli used irvExperlments i and 2 were recorded on audiotape and 
digitized on another cdiputer system, using high^-quallty audio components and 
a 12-t)lt A/D converter. The sampling rate was 20 kHz » With Q.6 kHz low-pass 
f ll,terlng. 

The entire stimulus set of 96 Items was presented twice. The first ^8 
stimuli spanned all of the factors Just described except "added versus re*- 
placed." The form of each token (^added" versus ^replaced") was randomly 
selected. : le second set of ^8 stimuli included the "other" form ("replaced" 
If the "added" form of a token had already been presented, and vice versa). 
The second pass through the 96 stimuli used the same procedure* Each group of 
^8 tokens was randomly ordered. 

Subjects were told that they would be hearing "sa," "sha," "su," and 
"Shu," with some noise present during each syllable. It was explained that 
the noise would occur "where , the consonant met the vowel," and that the noise 
would either replace a small bit of the syllable, or be superimposed on it. 
Subjects were instructed to press one button on a COTjputer terminal if th#y 
thought the noise replaced part of a syllable, and another button If they 
thought the noise was superimposed. 

The presentation of stimuli was subject-paced: Approximately one second 
ifter a subject's response was received, the next stimulus was presented. The 
entire procedure took approximately 15 minutes. 

Twelve individuals served as subjects in Experiment 3. They were 
recruited through sign-up sheets posted at Yale University, and were' paid for 
their participation. All were native English speakers with no reported hear- 
ing problems. Half had previously participated in another atudy In which they 
made similar judgments. 

Results and Discussion 

The central question of Experiment 3 is whether listeners can 
discriminate the "added" and "replaced" versions of the syllables. To answer 
this question, the percentage of correctr responses was calculated for each 
subject, broken down by matching condition (match, consonant mismatch, vowel 
mismatch), extraneous noise (buzz or cough), and stimulus form ("added" or 
*^ replaced ^)^ . These percentages wa^e used to calculate the .signa detection 
parameter d' . This blas^-free measure of discrimination performance^ was 
computed for each of the six cells defined by the crossing of the three match- 
ing conditions and the two extraneous noises. These values were submitted to 
a two-factor analysis of variance (matching condition X extraneous noise). 
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The results of this analysis can be summarized very simply: Subjects are 
utterly unable to discriminate "added" and "replaced" stimuli. In signal 
detection analyses, a d' of 0 indicates no discriminability, with increasing 
Values reflecting an ability to discriminate. The obtained grand mean d' was 
-0.003* indicating that the "added" and "replaced" stimuli could not be 
discriminated at al^. Giverf this, it should not be surprising that neither 
extraneous noise type, £(1,11) < 1, nor matching condition, F(2:22) =. 2.84, 
n.s., made a significant difference? their interaction was similarly 
inconsequential, F(2,22) - 1.31, n.s. 

What makes this null result of interest is that the "added" and "re- 
placed" stimuli produced significantly different refaction times in Experiments 
t and 2. We thus have a situation in which a manipulation that is totally 
unavailable to consciousness produces reliable differences in processing time. 
The extra acoustic discontinuity produced by the replacement manipulation is 
sufficient to slow down identification of the speech sJLgnal (Experiments 1 and 
2), but is not discrirainable from the mere addition of noise (Experiment 3). 

The inability of listeners to discriminate between the "added" and "re- 
placed" Items when they are explicitly asked to do so is reminiscent of re- 
sults obtained in studies of the phonemic restoration effect (Samuel, 1981a). 
An important difference to note, however, is that in studies of restoration, 
care is taken to remove all local cues to a phone; if the stretch of speech 
immediately before or immediately after the replacement locus is played, the 
relevant phone will not be heard. In the present study, both the fricative 
and the vowel are perfectly Intelligible In Isolation; only the transitions 
are replaced (or have noise added). Thus, there Is not enough evidence to 
tell whether the present results reflect some sort of restoration. A better 
analogy might be to the classic categorical perception findings (cf. Llberman, 
Cooper, Shankweller, & Studdert-Kennedy , 1967). In these studies listeners 
also fail to discriminate between acoustically different tokens (ones within a 
phonemic category). Moreover, just as In the present study, reaction time 
analyses of identification times reveal differences between these Indiscrlmln- 
able items (Pisoni & Tash^ 197^*). The reaction time analyses thus pgovide in- 
sights into the processing of speech that cannot be revealed In overt 
discrimination tasks. 

General Discussion * 

The phonetic mismatch effects of Whalen (198^4) were successfully 
replicated, even with stimuli containing a nonllngulstic noise, inviting the 
auditory system to block Integratation of portions of the signal. The present 
study also shows that having the original signal behind the noise is less 
disruptive than replacing the signal altogether. This Indicates that the 
perceptual system looks for coherence even within competing noise. The re- 
sults of this search for cohsrence are not available to consciousness, as Is 
shown in Experiment 3. 

It appears then that listeners are Indeed sensitive to ..ail. -phonetic 
Tnrdrniatlon giv«fi th»m, and l,lial~ the~delays caused by mismatches, even those 
that cannot be readily heard, are due to increased phonetic processing. Even 
when the subject Is given ©very excuse for failing to integrate, as when a 
nonllngulstic noise occurs in the middle of the signal, she still does 
Integrate. Thp mismatch adds Just as much time to the perceptual process 
whether the extraneous noise Is present or not. This indicates that the 
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previo(^l!^l|^t>taineCI result is not simply a short-term psycho-acoustic disrup- 
tion but is sustained over a relatively long s,tretch* Whether the information 
stored is acoustic or (weakly) categorical remaljis to be seen. 
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PARAMETERS OF SPECTRAL/TEMPORAL FUSION -IN SPEECH PERCEPTION^ 



Bruno H. Repp and Shloao Bentint 



Abstract. When the distinctive formant transition of a synthetic 
syllable Is presented to .one ear while the remainder (the "base") is 
presented to the opposite ear, listeners report hearing the original 
syllable in the ear receiving the base— a phenomenon called "spec- 
tr^al/temporal fusion" by Cuttin g (1976), Welhav^-found that the 
TSer^ onset Cl.e;;" the^first pitch pulse, 10 ma in duration) of an 
isolated, contralateral third-fonnant (F3) transition can be suffi- 
cient to cue' the /da/-/ga/ distinction in this way. We also varied 
the relative onset tiroes of isolated F3 and base, and compared three 
types of F3 segments (50-ms time-varying, 50-"ms constant, 10-njs on- 
set) under bo^h dichotic and diotic presentation. Time-varying F3 
segments were superior to constant ones, especially when they lagged 
behintl the base. Diotic performance exceeded dichotic performance, 
but only when F3 preceded the base, suggesting that upward spread of 
masking occurred in diotic presentation when F3 coincided with enerr. 
gy in the lower formants. Perhaps most interesti,ngly , subjects' 
tolerance of temporal asynchrony (roughly t50 ms) was about the ^me 
in dichotic and diotic conditions, suggesting that the temporal 
integration mechanism that combines phonetic information from the 
Isolated F3 segment and, the base operates similarly in both condi- 
tions. 

It has long been known that perceptual fusion results when the first for- 
mant (Fl) of a synthetic speech signal is presented to one ear while the high- 
er forroants are simultaneously presented to the other ear (Broadbent, I9^>b; 
Broadbent & Udefoged, 1957). In this situation, listeners' perceive a single, 
fused stimulus localized toward the side of Fl (cf. Darwin, Howell, & Brady, 
1978). A variant of this paradigm was introduced by Rand (197^), who present- 
ed only the time-vary ing ' F2 and F3 transitions of CV syllables to one ear 
while Fl and the steady-state portions of F2 and F3 were presented to the op- 
posite ear. The perceptual fusion that occurs in this situation has been la- 
beled "spectral /temporal fusion" by Cutting (1976). 

Spectral/temporal .fusion has received considerable ' attention in recent 
years. Research on "duplex perception" (Bentin 4 Mann, 1983; Liberman, '1979; 
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tiberMuiy Isenberg, & Rakerd. 1961; Maim k Liberaan, 1983; Wu apama, Schuab, 
# Savusett, f9d3; Repp, mi&urn, & Ashkenaa, 19d3> has focused on tue fact 
-tint, siBultaneouslx t'ith the speech, the isolated fomant transition is per- 
ceived as a jHonspeech "chirp." Thus the isolated transition contributes to 
phonetic and nonphonetlc percepts at the sane tlae, Vhich has been interpreted 
as evidence for the sionjltaneous operation of u speech-specific and a general 
auditory siode of perception (Libenaan, 1982; Libernan et al., 1981; Mann & 
Libersan, 1983). Recent studies »MLve shown that the speech and nonspeech 
percepts in this situation are affected in different degrees by manipulations 
such as Basking or attenuation of the distinctive isolated transition (Bentin 
& Mann, 1983). 

In the present studies, we are not directly concerned with duplex percep- 
tion as s uch. — Rather, wo focus -or the opeeoh paroept only and examine aowe of — 
the factors that^y limit the occurrence of fusion in this special situation. 
By "fusion" we mean here the cpntribution of the Isolated transition to speech 
identification." The strict definition of fusion as a single stimulus percept 
from two separate inputs clearly does not apply in duplex perception. In 
Experiment 1 , we examine how long the distinctive isolated foroant transition 
must be to enable listeners to discrioiinate between two alternative syllables 
when attending to the ear receiving the nondistinctive base. Experiment 2 is 
a parametric study of the effects of temporal asynchrony on spectral/ten^oral 
, fusion. Including compr3risons of dynamic and static "transitions," and of 
dichotic versuG diotic presentation. 

Experiment l ♦ 

.All previous studies of spectral/tei^oral fusion have followed the stan- 
dard paradigm described above. In each case, a complete formant transition 
was presented to the ear contralateral to the base, although the duration of 
the Isolated transition varied from 30 to 70 ms across , fferent studies. In 
the present study, we wished to determine, first, whet the full transition 
is needed to make the speech distinction, or whether a truncated version or 
even just the onset of the transition would suffice. Second, we asked whether 
the presence of the steady-state continuation of the same formant In the base 
is a necessary condition for spectral/teny)oral fusion to occur. The second 
half of the term, "spectral/tefi^cx'al," suggests that an affirmative answer was 
^sumed by Cutting (1976). To test this inference, we omitted from the base 
the steady-state resonance following _the critical transition, expecting (on 
the basis of pilot observations) that fusion would nevertheless be obtained. 
(A, direct comparison of conditions with and without this steady-state formant 
in the base was conducted in Experiment 2.) 

The materials used were the syllables /da*/ and /ga/, synthesized so as *to 
differ only in the F3 transition. Earlier studies have obtained strong spec- 
tral/temporal fusion with similar stiiull (Mann & Liberman, 1983; Repp et 
al., 1983). The experimental manipulation in Experiment 1 , then, was to re- 
duce the duration of the isolated F3 transition (appropriate for either /da/ 
or /ga/) until only its onset (i.e., the first pitch pulse) remained, while a 
constant twq~for»ant base was presented in synchrony to the opposite ..ear. 
. Sp^J^*l'^tej¥».oral fusion _w^s a aaefiSffri in te r ms o f wUJtfO Xs'^ ability to distin- 
guish /da/ and /ga/ in the ear receiving the base. 
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* Subjects . Twelve subjects (thre** males, nine females) were tested. They 
^ ^ • ^TlYale undergraduates and were paid for their participation. 

stimuli . The stiiftllf were three- formant syntheWc approxim^lons of the 
' syllables /da/ and /ga/; produced on the parallel software synthesizer at Has- 

*<ins Laboratories, stfi^ illustrated schematically in Figure 1. The\first two 
^ ' forniants were identical in both syllables, and constituted the "baie." The 

I duration of the base was 250 ms with a 50 ms amplitude ramp at onsetVand a 

constant fundamental frequency of 100 Hz for the first 100 ms, followedXpy a 
linear decrease to 80 Hz at offset. The first formant began at .279 iiz aijdxn*' 

or o ao c d linearly in freqtiency dorrng ITie fTrst 50 ms to a steady state of 

Hz. The second formant began at 1650 Hz and decreased linearly in frequency\ 
f during the fipft 50 ms to a steady state of 1230 Hz. The base by itself is \ 

perceived as either /da/" or /ga/ or as ambiguous, depending on the listener. 
The /da/ thlrd-formant transitinn, originally 50 ms (5 pitch pulses) in dura- 
tion, began nominally at 2800 Hz and decreased linearly in frequency to 2550 
Hz, while the /ga/ transition began nominally at 1800 Hz and Increased linear- 
ly In frequency to 2550 Hz. (These are the "dynamic" transitions In Figure 1; 
the actual F3 frequencies In the 'first pitch pulse were 2775 and 1875 Hz, 
^ respectively— see caption to Figure 1.) Five transition durations were used, 

as indicated by the tick marks in Figure 1: 50, ^0, 30, 20, and 10 ms (5, 
^ 3» 2, and 1 pitch pulses, respectively). Since the frequency trajectory was 

" not changed, the shorter transitions had offset frequencies increasingly 

closer to the onset frequencies, 

» 

The stimuli were recorded onto magnetic tape, with the Isolated F3 
transitions on one channel and the onset-aligned, constant base on the other. 
There were 2^0 stimuli altogether; 2H repetitions of the /da/ and /ga/ 
transitions at each of 5 durations. The stimuli were arranged In 5 randomized 
sequences, with XSIs of 2.5 s between stimuli and longer Intervals between se- 
quences. 

i " . , 

Procedure . The tapes were presented at a comfortable Intensity over 

TDH-39 earphones in a quiet room. The base was always in the left ear and the 

transition was in the right ear. (No pronounced ear asymmetries have been 

observed in thl^task.) Subjects were instructed to listen to their left ear 

and to identify the syllables in writing as beginning with either "d" or "g." 

Results and Discussion 

Performance for 50-, ^40-, and 30*-ras transitions was nearly perfect: 96, 
^ 97, and 98 percent correct, respectively. For 20-ras transitions, performance 

dropped to 91 percent correct, and for 10-ms transition onsets, to 84 percent 
^ - correct. Individual subjects' scores in the last condition ranged from 66 to 

5. percent correct. Thus, although there was some loss in accuracy, even the 

10-m s sin gle pit ch-pulse transit ion onsets were sufficient to distinguish /da/ 
fr—- ■ Sno /ga/ Tn'tfie c^posite ear. Accordirigl y'r tlme-'iTar^injrfreque^^^ 

m^, in F3 dom not sees essential either* for this particular phonetic distinction 

or for spectral/temporal fusion to occur. 

— *^ 

- o 
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Figure 1. Schematic illustration of the center frequencies of tne first three 
forroants in the stimuli of Experiments 1 and 2. All forroant 
transitions are drawn as idealized linear functions connecting the 
nominal frequencies used in synthesis* The formant frequencies 
were actually constant within each pitch pulse at values halfway 
between the nominal onset and offset frequenci^^s for that 1£>-hjs pe- 
^ rlod. The 'Mynarflrtc^ transition^ w^r^ usoct in botJi experifflt?m§j;_ 

N the tick marks indicate the sHortening manipulation in Experinwnt 

K The ••static*' F3 segments were used in Experiment 2 only. The 
d/ioheU line represents the F3 dttijady s>tato presertt ir^ the ba;3e on 
half of the trials in Exp^laent 2. 
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In addition* it la clear that the at^aence of the F3 steady, state in the 
fease did not prevent fusion. Sinee teoporal continuity in^Jhe%I«Taat. fre- 
quency band thus.seeas to contribute little (see also Experiment 2), spec- 
traI/te«poral fusion appears to be Just a special case of spectral fusion 
(Cutting's, 1976, tens for the fusion o^ coi^lete foraants presented 
siaultaneously to different ears). The difference lies in that only the f corn- 
er situation gives rise to a duplex percept tsyllable and "chirp"); the oech- 
aoisa that reconstitutes the speech percept from separate con^oncnts, however, 
seems to be the same. 

It Bight be argued that the subjects accomplished their task by paying 
attention to the chirp-like Isolated transition and responding^ "g" when the 
chirp vas low-p Itched "and ^d" when It was high-pitched (cf. Nusbaunj et al., 
1983). Even though no catch trials were employed in the present study/- this 
possibility is virtually ruled out by previ evidence that CD subjects do 
attend to the ear receiving the base when inatructed to do so (Mann & Liber- 
nan, 1983; Repp et al., 1983), and (2) they are unable to associate isolated 
F3 chirps consistently with the response categories "d" and "g" (Repp et al., 
1983). Moreover, all listeners agree that the syllables In the ear receiving 
the base really do sound alternately like /da/ or /ga/. Therefore, the pre- 
sent subjects' responses almost certainly reflect the combination of informa- 
tion from the two ears. 

It may be noted that a 10-ms F3 onset is not only devoid of time-varying 
Information but is also nonperlodic, consisting only of a single glottal ey- 
rie. By Itself, it sounds like a click. Informally, we have confirmed that 
fusion is also obtained when this 10-ms pitch pulse is replaced with a 10-«s 
burst of noise with the same spectral envelope, generated by the aperiodic 
source of the synthesizer. This observation reveals a possible similarity 
with a phenomenon reported by Pastore, Szczeslul, Rosenblum, and Schmuckler 
(1982), who found that a burst of filtered white noise changed the perception 
of a contralateral /pa/ to /ta/. These findings indicate that dichotlc 
integration of phonetic information can occur even if the signal in one ear is 
periodic and the other is not. It is not clear whether such phenomena should 
be attributed to general processes of auditory fusion. Rather, they may 
constitute evidence for a central phonetic decision mechanism that operates on 
Inputs from both ears. 

Experiment 2 

To explore in more detail the parameters of spectral /temporal fusion, we 
conducted a multifactorial experiment Including four independent variables: 
(1) A range of onset asynchronies between the isolated F3 segment and ' the 
base; (2) dlchotic versus dlotic presentation; (3) static (constant frequen- 
cy) versus dynamic (time-varying frequency) F3 segments, and (H) bases with 
and without a steady-state F3. 

Effects of stimulus onset asynchrony (SOA) on spectral /temporal fusion 
were studied by Cutting (1976) with synthetic two-fo^mant stimuli. The 
Isolated F2 transition was 70 ms in duration. Cutting used tifansltljpn-Jsase- 
Xag t r«es of up to 160 ms^ sgaged Ijv logafMrfehwit? steps"' but reported 
hf» rewTts averaged' over leads and lags, since he found no sigAif leant asym- 
metry. As expected, speech Identification performance dropped as SOA in- 
creased. However, performance was still slightly above chance even at the 
longest Interval (160 ms), although the sUtlstical significance of this find- 
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Ing waa not> ^etensined. The longest interval at which performance was 
substantially above chance was 10 o)s. 

In a recent study, Bentln and Mani||<l983j Exp. 1) used SOAs of up to 100 
as with two-formant syllables similar to Cutting's, although the transitions 
were only 50 ma in duration. Only lead times were used; that is, the F3 ceg- 
ment always preceded the base. Subjects' performance declined steadily with 
increasing SOA, but was still above chance at the lOO-ms Interval. These re- 
sults are consistent with Cutting's Jn that they suggest a considerable toler- 
ance of temporal asynchrony in spectral/ten^oral fusion. 

In the present study we sought to replicate these findings with stimuli 
distinguished by a dirrereiWe In the F3 transition. Particular atterttlon was 
given to possible performance asyaoetries between lead and lag times. Cut- 
ting's (1976) negative finding notwithstanding, such asyvmetries might be 
predicted on at least two grounds. First, when the F3 segment lags behind the 
onset of the base and thus coincides with the vowel, it may suffer some con- 
tralateral simultaneous masking that is absent when the F3 segment precedes 
the base. Second, when the F3 segment lags behind, listeners may conceivably 
be able to classify the base phonetically before processing the F3 segment. 
Both considerations predict stronger fusion when the F3 segment leads the base 
than when it lags behind. On the other hand, one might .also predict the oppo- 
site: It is known that, in auditory perception, the terminal frequency of a 
tone glide is more salient than its initial frequency (N^bSlek, N^bSlek, & 
Hirsh, 1970; Schwab, 1981). If a leading F3 segment is reUlned in auditory 
memory before it is integrated with the base, its distinctiveness might be re* 
duced because full. /da/ and /ga/ transitions have the same terminal frequency. 
This may confer a relative advantage on lagging F3 segments, which need not be 
stored in auditory memory , 

• 

A second comparison in Experimfeit 2 'concerned dichotic versus diotic 
presentation of the stimulus components. Rand (197'«) conducted such a 
comparison for onset-synchronous transition and base and found better speech 
discrimination in the dichotic condition. He attributed this to simultaneous 
masking of higher by lower formants ^In the diotic condition, and to release 
from this form of peripheral upward sjiread of masking in the dichotic condi- 
tion. S'ibsequent studies (e.g., Danaher & Pickett, 1975; Nye, Nearey, f, 
Rand, 1971; Nearey & Levitt, 1971) have replicated this difference, although 
there are also negative findings in the literature (Musbaum et ai., 1983; 
Repp et al., 1983). This is the first study to vary SOA in such a comparison. 
If upward spread of masking operates, then the advantage of dichotic over di- 
otic performance should hold at all lag times, as long as the F3 segment 
coincides with the base. However, no such difference should exist at lead 
times, unless there is significant peripheral backward masking of the F3 seg- 
ment by the base, which seems unlikely. 

Another question of Interest was. whether listeners would I>e equally tol- 
^rant jaf_*t4j»»ulti« onset asynchronies in diotic and in dichotic presentation. 
Presented monotically or dlotically, onset-synchronous diOtlc transition and 
base constitute, of course, an intact sylUble. It has not been atten^yted 
previously to advance or delay the isolated transition with respect to the 
base when both occur in the same channel. At least one dichotic fusion phe- 
nomonon (the Influence of "^a contralateral *whlte noiae burst on the peroeived 
place of articulation of a atop con:K>nant) does not seem to occur when the 
stimulus components are presented dlotically (Pastore ct al. , 1982). We con- 
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•idered it poMibie that fusion of transition and base in the diotio oondltion 
■Iflit be f*estrioted to ehort SQAs, where there is physical overlap, whereas In 
the dichotic oondltion subjects sight be • less sensitive to teaiporaJl 
asynchronies. 

A third oonparison of interest concerned the nature of the F3 segaent 
conveying the distinctive information. Three kinds of F3 seffsents were co«- 
pared* (1) standard ti«e-varying ("dynamic") F3 transitions; (2) short 

lO-na onsets (as in Experinent 1)} and (3) 50-na consUnt ("static") F3 seg- 
■ents, which were obtained by extending the transition onset frequencies, as 
illustrated in Figure 1. The static F3 segnents were of special interest: 
Fi^st, would they be sufficient to cue the /da/-^a/ distinction? (The 
effectlveness^cf the short F3 segments in Experrment 1 suggests a positive an- 
swer. > Second, would they be as effective as dynamic F3 segments, or does the 
dynamic information convey additional phonetic distinctiveness? Third, the 
static F3 segments for /da/ anrt /ga/ have distinctive terminal (as well as 
initial) frequencies, which may be an advantage at F3 lead times. Up to a 
lead time of MO ms, the distinctive end of a static F3 segment actually still 
overlaps with the onset of the base. As a result, performance at short lead 
times may be better for static than for dynamic F3 segments, unless the dis- 
tinctive phonetic Information derives strictly from F3 onset and pby'slcal 
overlap Is Irrelevant. Coroparisone with the short F3 segment should also be 
enlightening In that regard, although the short duration of this stimulus 
entails a loss In energy and a consequent decrement In dlscr lmlneU)illty. 

In addition to these three major factors (SOA, mode of presentation, and 
type of F3' segment), the experiment also Included a comparison of bases with 
and without an F3 steady state. Since Ejsperlment 1 had shown strong fusion In 
the absence of an F3 steady state, little effect of this last factor was 
expected. 

Methods 

Subjects . Twelve paid volunteers participated, six men and six women. 
Five of them had been subjects In Experiment 1 . Of the other seven, two had 
to -be replaced because of exceedingly poor performance. 

Stimuli . The basic stimuli were the same as In Experiment 1. In addi- 
tion to the base used there, a second base was used that Included a 
steady-state F3 at 25^0 Hz, starting 50 ms after , the onset of Fl and F2, at 
the same tinfe as the steady states of these formants. (The vowel had very 
nearly the same quality with and without F3. ) There were three kinds of F3 
segments; The dynamic (50 ms) and short (10 ms) versions corresponded to the 
extremes of transition duration used in Experiment 1 ; the static (50 ms) F3 
segments were synthesized at constant frequencies corresponding to the nominal 
onset frequencies of the dynamic segments (see Figure 1). 

Three stimulus tapes were recorded, each corresponding to a different 
type of F3 segment. Each tape contained 10 blocks of 22 stimuli, each block 
berrtg-a-ramteolzatlon of the two F3 segments for /da/ and /ga/ recorded on one 
track, at It different SOAs in relation to the base -on the etiwr track. The 
11 SOAs were: -100, -70, -iiO, -20, -10, 0, 10, 20, kO, 70, ard 100 ms; a 
negative SOA means that the F3 segment led the base. In adiltlon, odd-num- 
bered blocks conUined the base without F3, while even-nurtbered blocks con- 
Uined the base with a steady-state F3. The ISI was 2 s, and there were 6 s 
between blocks. 

no 
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225 procedure . Bach of the three stimulus tapes was presented in 
tt*o oonditlonai dlohotie and dlotic. All six conditions were presented in a 
single session. The order of conditions was strictly counterbalanced across 
subjects, with the constraint that all diotic conditions either preceded or 
followed all dlchotic conditions. 

A brief faBillarizatloo sequence with dynamic F3 segments at. SOm-O was 
presented at the beginning or the session. This sequence included 10 stimuli 
in which /da/ and /ga/ alternated, followed by a randan arrangement of 20 sti- 
muli. The sequence was first presenteally and then dlchotically. The 
subjects tried to identify the syllables and were given feedback after the se- 
quence. If more than a few errors were committed, the sequence was presented 
a second time. 

Subjects were run Individually under the same conditions as in Experiment 
1. The Upc recorder channels were calibrated for equal intensity of a re- 
peated vowel. Diotic presentation was achieved by mixing the two channels 
together and feeding the result to both earphone channels. No intensity 
,^Justment was made; because of the relative weakness of the F3 segment, the 
Increase in the total amplitude of the mixed syllables over the isolated base 
was minimal, in the dichotic conditions, the F3 segment was presented to the 
right ear for half of the subjects and to the left ear for the other half. 

The stricture of the stimuli and of the test tapes was explained to the 
subjects in advance. They were asked not to rely on the high or low pitch of 
the F3 segment and to focus their attention on the speech percept only A 
forced choice between «d- and "g" responses was required for each stimulus. 

Results 

The main results are shown in Figure 2, where the percentage of correct 
consonant identifications is plotted as a function of SOA (abscissa), type of 
f} segment (separate functions), and presentation condition (separate panels) 
A 5-way repeated-measures analysis of variance was conducted that included, in 
addition to the three factors Just mentioned, type of base and high/low F3 as 
factors,' that is, the statistical analysis €fcs conducted on "g" responses (or 
equival^ntly, "d" responses), not '.n percent correct. In fchis analysis, all 
effects with respect to percent correct are interactions involving the 
high/low F3 factor. 

The first result evident from Figure 2 is that SOA had a clear effect: 
Performance decreased as SOA increased in either direction, F(10,110) - 32.07, 
£ < .0001. A second clear effect is that of type of F3 segment; Performance 
was generally best for the dynamic F3 segments and poorest for the short F3 
segments, F{2*22) - 11.02, £ < .0005. Performance for the short F3 segments 
at SOA-0 in the dichotic condition was a good deal worse than in Experiment 1 , 
for reasons that are not obvious. The third main effect evide^ frooj the fig- 
ure is that, unexpectedly, performance in the diotic condition was higher than 
in the dichotic condition, F(1,n) - 7.06, £ < .03. 
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Figure 2. Percent correct as a function of SOA, separatoly for dichotic and 
diotic conditions, witn type of F3 sogrot^nt as parameter- 



Because of the general conv'-r^^vr^Cf of scorej ."it the extremes of the SOA 
range, interactions with SUA also reflect main effeutti, at l-^ast in part. 
These interactions w^re highly significant for Doth typo of F3 sognient, 
F(20,220) • 5*5t, g < ,000!, and p'-esentat ion condition, F(lO,no) - £ < 

• 0001. Despite thi3 latter intor-:iOt ion, list^-nera* tolerance of 3UAs seemed 
similar in the two prt^aent^^t ir^n c jnu i t ion.s/ No other ^^frect;5 on porc^^nt cor- 
rect were significant. 

Some more d^/tailed dif ferenc^>i3 in Figurv ar*» not directly captured by 
the statistical analysis but des^^rve attention. Kirbt, in the dichotic condi-- 
tion performance wa.: generally t;r:;t at SOAO, f?x;:^'ft^'d, but in the diotic 
condition, optimal performance w:is at 5hort nt*gative JOAs. Sroond, th*- effect 
of* SOA was generally asymmetric, though more Ju in the diotic than in the 
dichotic condition: Performance was generally better when the F3 segment led 
the tase than when it lagged behind. This was ^»;L>pecialiy true for the longest 
intervals used: At -70 and •lOO ms of SUA, pop formanc^'^ wan clearly at)ove 
chance (£ < .05 for 11 of 12 conditions by sign tt-iit), wh^r^^us score;; were 
near chance at 70 and lOO ma of SOA < .05 for on'y 1 of t^' conditions). 
Inde^d^ the absencta of any decline in p*>rformanc^' betwo^n -70 and -100 ms of 
SOA suggests an asymptote that may t-eflect an f?ffecn othor thaf. i-poctral/tem-- 
porai fusion, such as i response bias contingent on th*- pf?rceived pitch of tne 
F3 Miffflent* Thirdt^it may be notedjthat the superiority of tjynamic over stat-- 
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ie F3 Mffaents did not hold at lead tiaes of -40 ma or more, and that the 
■uperiority static over short F3 segaents was auch anre pronounced at 
negative than at positive jSOAs. 

Ofte consequence of the differential asyaoetry of the effect of SOA in the 
diohotic and diotic conditions is that diotic perfomance exceeded dichotic 
perfo^aance priaarily at short P3 segment lead times. This Is especially 
clear from- Figure 3» wherj^the difference between diotic and dichotic scores 
Is' plotted. Jt is also*^dent that thi^ difference is similar for all three 
types of F3 segments. (The relevant , Interaction was not significant.) 

The statJLatical analysis 'revealed several additional effectc that related 
Specifically to the percentage of ''g"/(©r "d") responses, rather than to per- 
cent correct. Figure H shows the percentage- of "g^" responses as a function of 
SOA^ high/low •F3, and type of base; the scores are averaged over the three 
typ^s <ff F3 segmqnt and the two presentation conditions. Naturally, there 
were more "g" responses to stiipuli Including the low F3 than to stimuli 
including the high F3, F(1,11) « 166.8^, £ < .0001. It is also evident that 
the effect of tne low F3 segment, which increased "g" responses when effect- 
ive, was lan^c ' than that of the nigh F3, which decreased "g" responses, so 
that the ti'.y; rmrober of "g" respcxises varied significantly with SOA, 
F(10,T10) - , £ < .0001. Of course, the interaction of high/low F3 and 
SOA. was highly significant; it corresponds to the main effect of SOA on per- 
cent correct, reported above. It may also be noted that the asyowietry around 
SOA-0 at short SOAs, deriving mainly from the diotic condition (cf. Figure 2), 
was pronounced only for low-F3 stimuli; the effect of |0A for |iigh-F3 stimuli 
was more nearly sy'inmetric. The asymmetry at long SOAs ms equally present for 
both types of stimuli, however. 

An unexpected result evident In Figure i4 is that, overall, more "g" re- 
sponses were given when the base contained a steady-state F3, F(1,11) - 17.13, 
£ < .002. The presence of a steady-state F3 apparently enhanced the spread of 
energy following the release, which is characteristic of velar consonants 
preceding back vowels. This difference was more pronounced at long than at 
short SOAs— F(1 0,1 10) - 7 . 16 , £ < .0001 , for the interaction— which confirms 
that the effect- originated In the base. However, the effect also interdicted 
^ith type of F3 segment, ri2,22) - 10.18, £ < .0007, being strongest with the 
short F3 segments ar^d weakest with the dynanic F3 segments. Thus, the most 
effective F3 seguientja also were able to overcome most effectively the*bias in- 
herent in the base iftself. A triple interaction between type of presentation, 
SOA, and high/low Ft3 was also dbtalned, F(10,110) - 3.'*^, £ < .0006, suggest- 
ing that the bias was overcoo» more effectively by the F3. segments In the di- 
otic condition. The differential SOA asynanetry in the two presentation condi- 
tions may also have contributed to this Interaction. 

Three additional significant inf ractions in the analysis of variance 
(between mode of presentation and high/low F3, between type of F3 segment and 
high/low F3, and between mode of presentation, type of F3 segment, and, SOA) 
essentially parallel effects on percent correct described earlier and there- 
fore need not be discussed any further. 
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Difference between dlotic and dichotic scores (Figure 2) as a func- 
tion of SOA. 
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Discussion 

* 

Bxperlaent 2p in conjunction with Experiment 1| investigated three fac- 
tors that were expected to play a role in spectral/teioporal fusion of speech 
stifluli: (1) Structural' prc^erties of the isolated fornant transition and of 
the base; (2) temporal asynchrOny between the transition and the base; and 
(3) dichotic versus diotic presentation* 

It is now clear ttmt the isoiAated transition need not actually be a 
transition for fusion to occur* A steady -state formant with the same onset 
frequency, or even only the first pitch pulse of t;ie transiWon can be suffi* 
cient, although the dynamic frequency transition does seem to convey addition-^ 
al information* -Moreover, the baw need notj contain any continuation of the 
isolated F3 segment in the form of a steady^ptate F3* ^Experiment 2 has also 
shown that these same stimulus conditions qiable listeners to discriminate 
/da/ and /ga? in diotic presentation, when (at SOA«0) the stimulus con5)onents 
are physically integrated and the F3 segment is not perceived a aeparaCe 
nonspeech stimulus* What is different about the dichotic situation is the 
presence of . the added nonspeech percept: Segregation by input channel Is ef-- 
fective at an auditory level of perception but apparently leaves phonetic 
perception unaffecjted, at least In the present paradigm. ' ^ ' 

This conclusion is also supported by the finding that the range of SOAs 
over which above^^hance speech discrimination was obtained was very similar in 
dichotic and diotic presentation. Thus, even when the isolated F3 segment 
:>receded the base on the same channel, it was nevertheless (partially) 
integrated with the base into a phonetic percept. Thus, the expectation ;^hat 
listeners would be less tolerant of SOAs in diotic presentation was not borne 
out, and the present results in fact suggest that spectral/temporal fusion is 
not specific to dichotic presentation at all. Nor is duplex perception: The 
F3 segment preceding the base on the same channel is perceived as a nonspeech 
event — a case of monaural duplex perception. We conclude that perceptual 
integration in phonetic percept-ion operates regardless of mode of stimulus 
presentation, and apparently regardless of whether the stimulus appears uni- 
tary or segregated at an auditory level of perception. Although there are 
some obvious limits to this dissociation, it nevertheless strengthens further 
the traditional distinction between speech and nonspeech modes of perception. 

There were two kinds of asymmetries with respect to the effects of SOA. 
One of them was equally present in dichotic and diotic presentation; Speech 
discrimination was above chance at long negative SOAs but dropped to chance at 
long positive SOAs# No such asymmetry was noted by Cutting (1976); however, 
the above-'Ohance scores at long negative SOAs replicate the findings of Bent in 
and Mann (1983)* Some of this asymmetry may be due to (central) masking of 
lagging F3 segments by the overlapping vowel; however, it seems that the 
above^^hance performance with leading F3 segments is the finding in need of 
explanation. Only speculation is p(^sible at this time. One possibility Is 
that leading F3 aagments are preserved in ^ (central) auditory memory and 
subsequently integrated with the base, whereas lagging F3 segments somehow 
cannot take advantage of auditory menory for the acoustically more c<Miplex 
base^ AXternativelyt identification of the F3 segment as "high^ or ''low^ may 
have exerted a bias on speech identification, which was more pronounced when 
the F3 segment led than when it"^ lagged the base. This explanation seems 
pXauaible» especially since the mib Jects were told about the corresp(xidenc« of 
F3 tilgmtnt pitch and f>li0fietlc eatagory. Although they were also tol<r to pay 
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•ttffitiOfl_to the 9peech percept only, a certain amount of Involuntary bias may 
h»Te been intri^duced by leading F3 segments. This bias was equally present in 
dlotic and dichotlc presentation. Assuming, therefore, that the above-chance 
performance at long negative SOAs was not due to spectral/temporal fusion 
proper, the range of SOAs over which this type of fusion operates seems rather 
limited— roughly, ±50 ma. 

The other asymmetry is the unexpected finding of optimal diotic perform- 
ance at short negative SOAs. This was also the region where diotic perform- 
ance exceeded dichotlc performance. The following explanation may be pro- 
posed; Diotic integration of the stimulus cofl?)onents may have been uniformly 
superior to dlchotic integration, but at positive SOAs diotic performance may 
have been lowered due to peripheral masking of the F3 segment by the lower 
formants contained in the base. Rand (1974) and many subsequent' studies have 
suggested that dlchotic sejjregation of a higher formant fron fl results in a 
release from upwrd spread of masking, which thus^s largely a peripheral 
(channel-specific) efCect. In fact, it was surprising that the present data 
did not Show an absolve advantage for dichotlc presentation at SOA-0 and at 
positive SOAs. The upward spread of masking explanation may account for an- 
other feature of the present data that seems difficult to .-xplalo in other 
-termst - App a r e nt ly^ -the^ asjumnetry -in_the diotic SOA effect was entirely due to 
the low F3; stimuli with a high F3 showed no "such asyiinetry. The rfe^n Tor — 
this may be that the high F3 evaded masking by the Fl and F2 transitions. The 
present data thus seem consistent with earlier findings on upward sp^re^ad of 
masking, if the assumption is granted that dichotlc fusion was not qtiite as 
strong as in some of the earlier studies. ' 

^An alternative possibility that comes to mind is that an F3 segment 
protruding from the base (at short negative SOAs) may have been perceived as 
if It were a release burst. This would explain why speech identification was 
more accurate at short F3 lead times than at lag times, but it would not be 
clear why this asymmetry was present only in the diotic condition and only for 
the high-pitched F3. Nor did the 50-'ros F3 segments sound like noisy release 
bursts; they had a distinct tonal qutHity. Thus, without additional assump- 
tions yet to be spelled out, this interpretation cannot account for the data. 

In jwmmsry, the present flncyjpgs reveal dichotlc spectral/tenqjoral fusion 
to be a phenomenon that is neither specifically dichotlc nor specifically tem- 
poral, The^fact that a tenporally or spatially segregated formant segment is 
audible as a separate nonspeech sound is not surprising; that such an 
auditorily segregated stimulus component still contributes ' -> an Integrated 
phonetic percept, however, is an observation that deserves continued atten- 
tion. Although Pastore, Schmuckler, Rosenblum, and Szczesiul (1983) have 
reported a somewhat analogous phenomenon with musical stimuli. It Is still 
possible to entfrtaln the hypothesis that the fusion effect studied here re- 
flects the operation of a central integrative mechanism specialized for 
phonetic perception. This hypothesis needs to be tested further with non- 
speech analogs of speech stimuli used in studies of spectral/temporal fusion. 
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Brad Rak«r<5.t Robert R. Verbrugse^tt and Donald P. Shankweilerttt 



Abstract. The identiflabillty of isolated vowels (/V/) was coapared 
to that of vowels in consonantal context (/pVp/) when subjects pm*" 
for«ed a aonitoring taafc. On sucoessiTe blocks of trials in a test 
series, the subjects listened for InsUnoes of one or another of 
nine »onophthongal vowels (/i,i ,e.as.A.o.o,u.u/) and identified each 
test»lte« as beii^ an instance or not. On average, resulting false 
alar* errors occurred significantly less often in the /pVp/ condi- 
tion, consistent with the previoib finding that vowel perception nay • 
be aided by consonantal context. This beneficial effect of <^ntext 
was found to be restricted to the class of open vowels, however, 
with perception of the close vow^s being sooewhat hindered by con- 
text. The error data for aisses also showed an interaction t>etween 
context and vowel height. Various accounts of interaction are con- 
sidered. 

Of continuing interest in speech research Is the question of whether 
yowel percep^toft is a ff o ot od by the consonantal cootextjn which a vowel oc- 
curs. Percelvers night be expected to exhibit 9ome context seositivity be- 
cause the acousti. correlates of a vowel often vary with ehanges In the iden- 
tity of neighboring consonants (Broad, 1976; Lindblo«', 1963; Stevens A 
House, 1963). Strong support for this hypothesis oo«es froai studies in which 
vowels have been shown to be «ore Identifiable in a consonantal context than 
in isolation (e.g., Gottfried A Strange, 1980; Strange, EdMin, & Jenkins, 
1979; Strange, Verbrugge, Shankweller, A Edoan, 1976). 

Recently, this evidence has been challenged on grounds that It is largely 
an artifact of the perceptual task subjects have been asked to perfora. It 
has typically been required that subjects sake a aultiple-choice ideiitifloa- 
tlon Judgaent by; (l) selecting the "beet aatch" to a presented vowl froa 
aaong a prescribed set of alternatives; and (2) Indicating their choice by 
circling a written fonr of the alternative on an answer sheet. That written 
fora can bo orthographically related to a presented Itea in varying degrees. 
It can, for exaaple, be an English spelling of the Itea itself (e.g., "pep" as 
the correct response to /pep/), or it can be a spelling of a word that con- 
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.i.JtAiM tte "SMHi^ roMr J»-X»» pf«M0t«4 itm "b«d" . aa the correct rer 

ft«n»tl^ tiM iMen thoim to affect votwl identification perfonMuice (Asaaartn, 
Wmtf, k Hepui, 1982} Diahl, HoCuskcr, & ChapMin, 1981 1 Naochi, 1980)* this 
inriaMla naa not oontroUed in early studies of conaonantal context (e.g., 
# Strange et al.» 1976), therefore the significance of the obtained effect has 
been called into question (but see Strange * Gottfried,. 1980). 

The algnif icancii of the context effect has also been questioned on the 
arguaent that the typical response task-~i.e., the searching for and circling 
of an ^>gropriate alternative on an answer ittieet — is Itself soaewhat biased in 
favor of the context condition. This Is oving^to the fact that «ich a task 
ikakes strong deaands on short-tera neaory in that a stiaulus trace sust be 
held long enough to be coi^Mired with each of the alternatives. Vowels in con- 
text aii^t be expected to be soaewhat better /^eaenbered than isolated rowels 
fcr two r^tmonst (1) Yowel-ooosonaot ooabimitjldns tend to make up words ^ 
ready represented in a subject's lexicon, <m* £t least portions of such w>r<Ssi 
and (2) in English, the orthographic representations of vowels in context tend 
to be less aabiguous than those of isolated vowels (see Dieh] et al., 1981, 
for elaborati<m on this arguaent). 

In light of these aiethodological concerns over past vrark, we thought it 
useful to tsake a coaparlson of the Identif lability of vowels In and out of 
context with a different kind of p^ceptual task than has previously been em-* 
ployed. While such a task would, no doubt, have certain llai tat ions of Its 
own, it was felt that If these were sufficiently different froa the linita- 
— tiena^ of ^hft JBiitiple-choice ideAtlfioation task, the results could speak to 
the Methodological generalization^ of any effects of xioiiaonantal^ context. The 
specific task we set for subjects was that of aonitoring lists of ~t<»rt~ttett8 
for instances of particular target vowels. Subjects siaply chocked "yes" on 
an answer sheet if a presented itea (an isolated vowel or a vowel in context) 
was Judged to be an instance of the target vowel being nonito'red and "no" if 
it was not. 

^ This aethod has two virtues that are noteworthy: it la oooparatively 
free frop orthographic bias since there are no written vowel alternatives on 
the answ«* sheet, and it iaposes ainiaal ae«CH*y deaands on the subject since a 
Resented Itea can be iaaedlately Judged to aatch the target or not. Monitor- 
ing thus affords a good ooaparison with the identification nethod of past 
studies. Here, we strengthened that coaparlson further by exanlning vowel 
stiuli for which perceptual data had already been collected with the previous 
aethod (Strange et al., 1976).' 

Experimental Methods 

Stiauli 

All /pVp/ and /V/ stiauli were produced by a single aale talker who spoke 
an Upper Nidwastem diafeot of English. For each condition, he produced five 
yjkmm of aach of the nine vowels. These were organized into /pifp/ and /V/ 
cast serlea acoorditig to the following protocols (1) 90 items (two repeti- 

"•tlena of aadt tiihui) were a saeab led in randoalaed oHier to aalee up a block » 
(2) aonitoring inatructlona identifying the particular vowel to be listened 
for la that block were Inserted at its beginning; (3) Instructions reainding 

" tut Mlbiiot^^ the target ircual M«*a inserted after the 30th and 60.tK Iteas} 
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(J) »t«p« (1) through (3) wero pepaated for a total of nine teat blocka. 
There mm a 2-8 pm* betwm te»t U«wi and a 30^» imuw Dettim Wocks. 

^ , . . ^ * .-. ... 

Tht ■ooltorlng and raalndar i Inetruotio^ were recorded by a sale speaker 
with the saae dialect as that of the speak^ who had produced the teat atinu- 
II* for both experliRental condt/tloiw, the ■onltoring inatructiona were given 
In the following font: "In thla teat block, you will be llatenlng -or the 
yowel (exa^lar 1), aa In (pvc 1), (CVC 2), (CVC 3). Listen for the wwel 
(axeaplar 2), (exeaplar 3)» («xe^>lar Hh" Tlie ^ej^lars were Isolated 
produotloo3 of the vowel. The CVCs w^e English monosyllabic words that con- 
Ulned the vowel.' The rewlnder Instructions were as follows: "Reaeaiber, you 
are listening for the vowel (exe«plar 5), (exenplar 6), (exeaplar 7)." 

The order in which vowels were aonltored was varied across listeners; 
nine different orders were generated with the constraint that each of the nine 
vowels waa monitored in each ordinal, position.^ 

Aooustlo character latlcs of the stlaull . These stimuli are a subset of 
the items eiployed In a previous study of vowel p«»cepti^n (Strange et al., 
1976), Their acoustic characteristics conform to generalizations reported In 
that study. The first of tnese generalizations is that the fcrmant frequen- 
olem of all isolated vowels except /3/ were comparable to normative values 
reported by Peterson and Barney' (1952). The deviations in /»/ rmflect an ld« 
losyncrasy of the speaker's dialect. Average first formant frequencies for 
the vowels in /pvp/ context were comparble to the values for isolated vowels. 
In contrast, the second formant frequencies of /pVp/ vowels were somewhat "re- 
duced" (of. Llndblom. t963) relative to the isolated vowels. That is to say^ 
they exhibited a somewhat smaller range of deviation about the average value 
for all vowels in the set. 

The Imoiated vowels were, on average, considerably longer than /pVp/ vow- 
els. Relative durations of vowels- In the two conditions were roughly compar- 
able, however. As might be expected on the basis of previous reports (e.g., 
Peterson & Lehiste, I960), the vowels /x ,£,A,o/ generally were the briefest in 
duration, ./I, u/ were Interfediate, and /as.o.o/ were' the longest- The excep- 
tiona to this were the vowel /u/ in the /pVp/ context and the vowels /a,o/ in 
isolation, all of which were somewhat shorter than expected. 

Subjects ' 

Thlrty-slx undergraduates enrolled in an introductory psychology cov»rse 
at the University of Connecticut, participated in this experiment. They were 
randomly assigned to -the /pVp/ and /v/ conditions, so that there were 18 sub- 
JecU m eaoh condition. All of the subjects were adult native speakers of 
English with normal hearing. They had no knowledge of the purpose of" this 
study. 

Procedure 

Subjects were asked to monitor the dlsts of test Items for occurrences of 
the moao^thongal vowels /l,t,e^,A,a,o ,v,u/« They reported t»^lr decisions 
by cheoklng ^yea** on an answer sheet if an. item was Judged to be an instance 
of the vowel being sonltoritd on a trial and "no" If it was not. 
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iMtructipna and test MterUls wer« presented over headphcmes with the 

. »iiiye a^ tjueted to » eqitfof^le lUtenlat level, ooadUians coaparable to 

^noee e^pl^rea hi tne prenou» taentifibat'ldn study eonducted with these seae 
^ etlflull (Stren^ et 1976)* Subjeeu were tested, two at a tine, in a 

sound^attenuated roos. Before the start of testing, they were fa«lliarlsed 
jElth the stlsulue and response Mterlals in the following way; First, the 
testing procedure was desdsibed. It was explained that a nusber of different 
speech stlauli would be presented and that the task would to aonitor the 
vowels m the Banner described abov^. Mext, a randomljf selected sample of the 
stiaull was presented* For the first few trials (approxiowtely 15), subjects 
were asked to listen to the stimu-li and sake no response. Then, they were 
. giveiv ^i saaple answer sheet and, for 30 trials, nonitored the saaple itevs for 
instances of a particular target vowel. This Urget was randcnly varied 
IP aoross subjects. No' feedback was given as to the accuracy of these practice 
responses; subjects were, however, allowed to ask questions of clarification 
about all aspects of the procedure. The test was begun only after all sub- 
jects expressed confidence that they completely understood the task. 

Results 

With this oon 1 tori ng ^procedure, subjects could msU<e errors of two types: 
false 'alarms and misses. False alarms were erroneous acceptances of vowels 
othen than the target— responding '*yes'» when the correct choice was "no." 
Misses were failures to recognize actual instances of the vowel being moni- 
tored—responding "no" when the correct choice was "yes." Neither type of er- 
ror was significantly related to the order in which the vowels were monitored j 
consequently, the data that will now be considered were pooled across monltw** 
ing„ orders. ^ 

False Alarms 

In the left half of Table 1 , composite false alarm error rates are sum- 
\ marized for each vowel category. A composite false alarm resulted whenever a 
^ presented vowel was erroneously taken to be an instance of any of the other 
alteroativea. . For exaiq}le, in the isolated condition, listeners variously 
misheard the vowel /A/ to be an instance of /a/, and /«/. Together, 

these false alarms occurred on 6,^% of all trials in which /A/ was the 
presented vowel but was not, tn fac^, the Urget. Since many vowel pairs 
(/A-i/ for instance) were seldom if ever confused, averaging, over all of the 
alternatives in this way generally resulted in rather low error rates. Howev- 
er, this measure of false alarms is perhaps the most comparable to the miss 
percentage to be considered below and it will be seen that the data exhibit a 
similar structure. 

The two leftmost columns of Table 1 report composite false alarm rates 
for the consonantal-context (/pVp/) and Isolated (/V/) conditions, respective- 
ly. The ^difference in error rates between these two conditions is given in 
the third ooluan (/pVp/-/V/). Results for the vowel /V are reported 
separately in the table. This is because the acoustic cSiaracteristlcs of 
proved to be abnormal and because this vowel behaved differently than the oth- 
ef* Qpm vowels, both here and in the comparison study of Strange et 
mX^ (1976)^ (For furttMir oonslderation of this difference see the Discussion 
section belowi) Arc sin transformations of the composite false alarm data 
shown in Table 1, and of all other data to be discussed, were submitted to 
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TabU 1 

Average Composite False Alarm and Mlaa Error Rates 

Percentage Errors 
Coa9>osite False Alarms 



Misses 



Vowel 


/pVp/ 


/V/ 


/pVp/-/V/ 


/pVp/ 


/V/ 


/pVp/-/af/ 


i 


.5 


.2 


♦.3 


2.2 


1.7 


♦.5 


It 


.3 


1.9 


-1.6 


6.1 


1.1 


♦5.0 


C 


1.7 


7.7 




6.1 


11.1 


-5.0 


m 


1.2 


2.i»; 


-1.2 


3.9 


11.7 


-7.8 


A 


3.3 




-5.1 


6.1 


17.8 


-11.7 


a 


k.O 


5,2 


-1.2 


26.1 


11.1 


-15.0 


V 


*».7 


2.7 


♦2.0 


17.8 


13.3 


♦1.5 


u 


2.2 


' .6 


♦ 1.6 


6.7 


.6 


♦6.1 


Overall 


2T2 


3T6 


-1.^ 




12.3 


-2.9 


/a/ 


9.2 


6.0 


♦3.2 




3.9 


♦5.5 



Table 2 

Average Error Rates for the Major False Alarm Vowel Pairs 



Vowel Pair 

/e/-/aB/ 

/A/-/a/ 

/A/-/V/ 

Overall 



Percentage of 
False Alara Errors 



/pVp/ 


/V/ 


/pVp/-/V/ 


M.8 


30.8 


.-26.0 


11.2 


31.6 


-17.1 


15.6 


13.0 


♦2.6 


9.5 


3.6 


♦5.9 


11.0 


19.6 . 


-8.8 


50.8 


59.7 


-8.9 
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TW^of the findliifa regarding cooposlte false alarns spefk to the ques- 
MoB^of wbtMMf or not oomonanta ex»t a contextual Influence on vo«el 
perception, the first Is that, overall, error pates in the consonantal oondl- 
tlon were significantly lower than in the isolated condition, F(1,3i») » ^,20, 

f; < .05. This indicates that when listeners nonitor vowels, as^when they per- 
or« other identification t^sks (Gottfried & Strange, 198O; Strange et al., 
1979; Strange et al., 1976), their perforawnce nuy t>e positively inf .enced 
by- the presence of neighboring consonants. The second finding is that the 
beneficial effect of context was not in evidence for all vowels (see column 
three of Table 1). Generally speaking, it was the perception of open vowels 
that was aided by context, with perception of close vowels proving to be some- 
what poorer in the context condition. The only exceptions to this generaliza- 
tion were the vowels which behaved anooalously throughout, and /x/, which 
was s^ldon confused with the other vowels in either condition. This differ- 
ence between the open and close vowels was reflected in a significant interac- 
tion between context and vowel height, .£(1 ,3'0 - 20,6^, £ < .001. Post hoc 
exaaination of this interaction revealed'' that the simple main effect of con- 
text was significant only for open vowels, £(1,3^*) - 18.20, £ < .001. 

As noted above, false alarm errors occurred only rarely for many of the 
vowel pairs. However, a few pairs did show false alarm rates that were rather 
high. 'These are summarized in Table 2. Note that the mean false alarm rate 
for these vowel pairs was at least five times as great as the mean composite 
false alarm rate in both the /pvp/ and /V/ conditions. Hence, these 
highrlikelihood false alarm pairs were the major contributors to overall error 
scores. The two observations made about the composite false alarm data apply 
to these hii^likelihood false alarms as well. First, overall' identif lability 
of the vowels was enhan$fti}J)i^~«ontext. There were significantly fewer errors 
in the /pVp/ condltW; F(1 ,3^) - 8.88, £ < .01. Second, there was a slgnif- 
.icant interaction between context and vowel height, F(3,102) - 11.05, £ < 
.001 , reflecting the fact that errors on open vowel "pairs occurred signif- 
icantly less often in consonantal context,* /t-ae/: F(3,102) - 25.61, £ < 
.001; /A-a/j F(3,102) - 13*62, £ < .001, and those on^the close pair (/u-u/) 
occurred more "often but not significantly so. 

Hisses 

Miss errors are reported on the right half of Table 1. It can be seen 
that their overall pattern parallels that of false alarms. Subjects were» 
however, much wore variable in exhibiting the pattern with misses. As a 
consequence, the main effect of context was not sfgnifloant for these data, 
£(1,31) < 1.0. There was a highly signifloant context-by-vowel height 
Interaction, £(1,31) - 15.51, £ < .001. As before, this resulted from the 
fact that' performance on the open vowels was significantly aided by context, 
£(1,31) ■ 8.90, £ < .01, while that on the close bowels was hindered to a 
lesser and nonsignificant degree.* Also as before, /o/ behaved differently 
from the other open vowels. It was missed more frequently in the consonantal 
c<Midition than in Isolation. 

Tlw Question of Resgonae Blasea . 

* 

Although we have looked at false alarm and miss errors separately, the 
two are not strictly independent. Notice, for example, that if the subjects 
in this experimmnt had (for any reason) chosen to respond "yes" on all roonl- 
torlf^ trials, we would have observed no miss errors and lOOf false alarms. 
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Converaely^ a biaa toward **no** responses would have inflated misses and 
<k>riAt«d rals0 alarms* It is therofore reasonable to wonder whether some or 
all or the effects that we observed can be attributed to systematic response 
biases. Given the overall patterning of the two types of errors, this possi- 
bility can be confidently rejected^ It has been noted throughout that the da- 
ta structure of the miss and false alarm errors was roughly the sarne^ If 
there were significant response biases, we should have expected the two types 
of errors to have complementary distributions, not comparable ones. For exam-- 
pie, we should have expected that the observed interactions between context 
and vowel height would have been in opposite directions for the two types of 
errors. They were not. 



We compared listeners^ ability to Identify vowels in and out of a 
consonantal context (/pVp/) when they performed a monitoring task and found 
that they made significantly fewer false alarm errors (both composite false 
alarms and high-likelihood false alarms) in the /pVp/ condition. This clearly 
supports the view that the contextual advantage for vowel perception observed 
here and elsewhere (Gottfried 4i Strange, 1980; Strange et al., 1979; Strange 
et al., 1976) is a genuine perceptual effect and not simply a methodological 
artifact. At the same time, however, these monitoring results add to evidence 
indicating that the demonstrabllity of a contextual influence may be greatly 
affected by task variables. Pooling misses and false alarms, our subjects 
made an average of ^,]% errors in the /pVp/ condition and 5.3* in the /V/ 
condition. In the comparison identification study conducted with these same 
stimuli (Strange et al., 1976), substantially different error rates were 
reported. In that instance, there were 9-711 errors in the /pVp/ condition and 
33.111 in the /V/ condition. Clearly, absolute error rates can vary substan- 
tially with the method of assessment, and these fc /m the baseline against 
which any relative Influence of consonantal context must be measured.* 

There were two additional points of agreement with the s^dy of Strange 
et al. (-1976) that merit comment* The first involves the vowel /o/ , which did 
not behave like the other open vowels in the present instance. It turns out 
that perception of /o/ was ai^omalous in that earlier study as well. This can' 
be seen In Table 3, which suiwnarizes their multiple-choice identification data 
for our speaker's tokens (these data are excerpted from the segre'gated-talker 
condition of Experiment I in Strange et al., 1976). Notice that with their 
method Strange et al. observed a contextual advantage for the identification 
of all vowels except /o/. it appears that the unusual perception of this 
vowel reflects some abnormality in its production. This conclusion is further 
supported by the fact that formant frequencies for as produced in both 

conditions — were very different from population norms. 

The second point of comparison with Strange et al. (1976) concerns the 
perceptual interaction between context and vowel height that we observed. 
Some analog to that Interaction can also be seen in their data. Note in Table 
3 that while all vowels In their /pVp/ condition (except /o/) were identified 
more accurately than the isolated counterparts, open vowels were much more 
aided by context than the close vowels. The mean contextual advantage 
(/pVp/**/V/) for the open vowels was ^O.i%, while that for the close vowels was 
only 12*71b In both studies, then, we see some evidence that the presence of 
a /pVp/ context differentially affected perception of the open and close vow- 
els. 
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Table 3 



Identification Error Rates Detertnined by Strange et al. (1976). Data are for 
the Single Male Talker in Their Segregated-talker Condition of Experiment I. 



Percentage of 




Identification Errors 



Vowel 


/pVp/ 


/V/ 


/pVp/-/V/ 




0.0 


1.1.0 


-1 1.0 


0.9 


u.o 


-13.1 


£ 


1.8 


63.0 


-61.2 


ae 


1.8 


19.0 


-17.2 


A 


6.* 


57.0 


-50.6 


a 


^2.7 


75.0 


-32.3 


u 


15.5 


33.0 


-17,^ 


u 


1.8 


11.0 


-9.2 




16. K 


15.0 




Overall 


9.7 


33.1 


-23. ^4 



Though the acoustic and/or articulatory origins of this effect are yet to 
be confidently determined, we can make some preliminary observations. First, 
we may note that no satisfactory explanation of it is likely to be advanced in 
terms of formant frequency differences among the vowels. Owing to the phenom- 
enon of vowel reduction (Llndblom, 1963), those differences were in fact less 
great in the more perceptually distinctive /pVp/ condition. A more promising 
acoustic account is that the perceptual effect somehow results from the great- 
er degree of spectral cnange associated with open vowels. In /pVp/ context, 
open vowels are typically marked by more extensive formant transitions out of 
and into the flanking consonants than are close vowels. Vowel height should 
be particularly related to transitions of the first formant. There have been 
speculations that acoustic dynamics of this sort positively influence vowel 
perception (Strange, Jenkins, & Johnson, 1983; Strange et al., 1976). 

The acoustics also provide evidence that vowels and consonants were 
coartlculated In the /pVp/ condition— vowel formant frequencies were reduced 
In this context, fhls has led us to consider m articulatory account of the 
perceptual effect. It may be that the bencflAft Influence of /pVp/ context 
was focused on the open vowels because those vowels are coartlculated with the 
consonants In some manner In which close vowels are not. This could pertain 
particularly to articulatory movements of the Jaw. The Jaw lowering required 
'for production of open vowels must be coordinated .with Jaw raising to achieve 
bilabial closure for the consonants. While production of the close vowels 
would likewise call for some Jaw lowering (and hence for some articulatory co- 
ordination with the consonants), It Is conceivable that this requirement dif- 
fers In kind or degree from that for the open vowels. If listeners are aware 
Of such a coartlculatory difference, it could affect their interpretation of 
the acoustic signal, - 
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We plan to diitingulah between these alternative accounts of the 
perceptual effect by\ looking at yowel nonltoring performance in other 
conaonantal contexts.* \ While perceptlipn of the open vowels was particularly- 
aided by /pVp/ contextWn the present' atudy, we expect that rather different 
interactions will occur! with consonants of sorqe other place and manner of ar- 
ticulation. \ 
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Footnotes 
* * 

•Those earlier -data are for the single male talker in the segregdt- 
ed-talker condition of Experiment I in Strange et al. (1976), 

'Tne CVCs in the monitoring Instructions were as foilowy. For the vowel 
/i/: greer|^- ^eak, seal ; /i/: bit, tin, sick; /£/; pen, wet, step; 

hat, fan, map; /A/: cud, gurn, rut; /a/: to£, sock , dot ; /o/: fog, call, 

|one; /u/: put, look, should ; /u/; boot, cool, moon. ' 
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•Because the error rates were often close to zero^ they were transformed 
according to the following formula suggested by Winer (1962): 



X' • 2 arc sin /X^ 1/N 

Where X is the original score, X' is the transformed score, and N is the num- 
ber of subjects in a condition. 

A one-wlthin (vowel height), one--between (context) analysis of variance 
was performed on the transformed scores. 

^'In this instance /o/ again proved to behave di .erentiy from the other 
open vowels. False-alarm confusions between members of the vowel pairs /t-ae/ 
and /A--a/ were greatly reduced by /pvp/ context for both ♦'directions* --i .e. , 
with regard to confusions of the first member of the pair with the second and 
the second with the first. This was not the case with /o-a/ p however* In the 
/pVp/ condition, /a/ was misheard as /o/ much less frequently than i»i isola- 
tion (^7.21 false alarms vs. 75*551), but /o/ was misheard as /a/ more 
frequently In this condition (5^.^J false alarms vs. ^3*9^)* The data report-- 
ed in Table 2 reflect the average of these two types of confusions. 
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4 

Abstract . Although native speakers of Japanese may be unable to 
identify the phonemes [IJ and [r] in English, they, like native 
speakers of English, unconsciously take account of certain articula- 
tory differences between these speech sounds. One implication is 
that, preceding a language-specific level of speech perception where 
utterances are represented in terms of their constituent phonemes, 
there may exist a universally-shared level of speech perception 
where utterances are represented as articulatory patterns, 

What do native speakers of Japanese perceive as they listen to English 
utterances that contain [1] and [r]? In the absence of <^nsiaferable experi- 
ence with spoken English, many Japanese are unable to label, discriminate, or 
produce [1] and [r] in a consistent fashion (Goto, 1971; Miyawaki et al,, 
1975; Modhizuki, 1981), which would seem to suggest that they hear these two 
speech sounds as one and the same. This study offers evidence that wtie^her or 
not Japanese subjects can Identify [1] and [r] phonetically, they taciUy per- 
ceive an articulatory difference between these speech sounds. 

To demonstrate that Japanese speakers can perceive an articulatory 
difference between [1] and [rj, though not a phonological one, this study has 
focused on a specific context effect in speech perception (for a general 
discussion of such effects, see Repp, 1982). The effect occurs when utter- 
ances that end in [1] or [r] precede utterances f that begin with [d] or [g]. 
It nay be demonstrated by placing the spoken syllables [al] and [ar] in front 
of stimuli from along a continuum of synthetic speech syllables ranging from 
[da] to [ga]. The presence of the preceding syllables causes systematic 
shifts in the category boundary between [d} and Cg5: When the preceding^ syll- 
able is Cal}, th<? boundary is shifted towards more [g] percepts (less [d] 
percepts), relative to that obtained when the preceding syllable is [ar] 
(Mann, 1980). 
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By using the phenomenon known as duplex perception (Llberman, Isenberg, *^»s^' 
lUkerd, 1981; ftuid^ 1974), it has been possible to demonstrate t)|at the con- 
text effect of Clj and Cr] on perception of [d] and [g] is not due to some 
general property of acoustic perception, but is highly specific to the percep- 
tion of speech (Mann & Liberoan, 1983). In duplex perception, one and the 
same stinatlus is simultaneously heard as speech and as nonspeech. This situa- 
"tlon can be created by dividing synthetic speech syllables along a [dal to 
Cga] continuum into two parts; a constant base portion that tends to sSlmd 
like [da], and a third formant transition that in isolation sounds like a 
"chirp," but when combined with the base provides the critical cue for the 
distinction between [da] aa^ [ga]. When base and transition are presented 
<lichotically, the third ftw-maht transition is simultaneously perceived in two 
ways: as speech and nonspeech. It provides critical support for the percep- 
tion of [da] OR [ga] but also for the nonspeech "chirp." Listeners can be 
Instructed to attend to one or the other of thes3 percepts, and under instruc- 
tions to ignore the speech percepts and attend to the nonspeech chirps, 
perception is continuous, and no context effect occurs when stimili are 
preceded by [al] or [arj. In contrast, under instructions to* label or 
dif.oriminate stimuli on the basis of the speech perceptjs [da] and [ga], 
perception is categorical and the location of the category boundary can be 
manipulated by the presence of a preceding syllable [al] or [ar]. Thus the 
context effect of [al] and [ar] is evident only when the stimuli are perceived 
as speech. 

The explanation of why, in speech perception, [1] and [r] alter the posi- 
tion of the [d]~[g] boundary, rests on two related observations. F.irst, it 
has been found that the effect of a preceding consonant on the distinctfon be- 
tween [da] and [ga] is not limited to [1] and [r], but extends to the frica- 
tives, [s] and [/] (Mann & Repp, 1981), and that similarities are better de- . 
scribed in terms of art i dilatory, than auditory properties. Specifically, 
preceding [1] and [s], which are produced with the tongue relatively forward 
in the mouth, shift perception away from [da] toward the more backwards [ga], 
relative to preceding [r] and [/], which are produced with a more retracted 
tongue posture. Second, it has been shown that the perceptual effects of [1] 
and [r] find a parallel In speech production, where, owing to coart^peu\ation, 
the acoustic structure of [da] and [ga] can vary as a function of whether they 
follow [1] or [r] (Mann, 1980). Both observations support the view that the 
context effects of [1] and [r], along with many other context effects and 
trading relations (see, for -xample, Repp, 1982; Repp, Liberman, Eccardt, & 
Pesetsky, 1978), represent a perceptual sensitivity to the consequences of 
coarticula' ion in the speech signal. Human listeners appear to possess some 
tacit knowledge about artinulation and its consequences on the speech signal, 
and application of that knowledge may be part of what makes speech perception 
"special" (see, for example: Best, Morrongiello, & Robson, 1981; Liberman, 
1982; Mann & Liberman, 1983; Repp et al. , 1978). 

Aside from revealing the special nature of perception in the speech mode, 
studies of the context effect of [1] and [r] on perception of fda] and [ga] 
can offer insight into the relationship between articulation-based perceptual 
adjustment, phonetic perception, and specific language experiences If they 
compare native speakers of Japanese with those of English. English and 
Japanese share many phonetic types. Including [d] and [g], but Japanese does 
not distinguish the liquids [1] and [r] (Its single "liquid," [r], more clear- 
ly resembles an alveolar flap than English [a]). Consequently,, absence of 
early experience with this phonetic contrast render j many native speakers of 
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Japanese unable to distinguish English utterances that contain [1] and [r] in 
phonetic labeling tasks, dlsorlmlnatlon tasks, and In their own productions 
(Ooto, 197tt tfiyawaki et al.U 1975; Nochlxukl, 19Bt}. Yet tifo- to 
three-«onth-^ld A«erloan Infants have been found capable of making soae 
discrlalnatlon between utterances that contain [13 and [r] (Eisias, 19751^* and 
the contrast raises questions ' about the nature of native endownent and the 
role of exp«*ience in the development of speech perception. The present con*- 
text effect offers a means of answering sotne of these questions. 

One explanation of the speech perc^tlon abilities of infants vls-a-vis 
the phunetlc difficulties of native j^eakers of Japanese Is that a lack of 
specific experience has led to a loss of all ability, to perceive a difference 
between [1] and [r] (Elmas, 1975). Another, slightly dlfferwt view holds 
that Infants nay not perceive [1] and [r] as different (^onenes so much as 
they perceive them as different artlculatwy patterns. If so, lack expwl- 
ence with the Cl3*-Cr3 distinction might lead to an inability to distinguish 
Cl3 and [r] phonetically, but not necessarily to a desmsitlzation of the ba- 
sic ability to apprehend the artlculatory differences between them. Using the 
present context effect, one can test this possibility, by asking whether 
Japanese subjects who cannot phonetic categ^lze [1] and [r3 can nonetheless 
take account of artlculatory differences between them. 

Method 

Subjects 

Sixteen college freshmen enrolled in the first semester of a spok«i En- 
glish course at the University of Tokyo participated in the study. All were 
native speakers of Japanese who had never lived in an English-speaking soci- 
ety. They were selected by the English professor from a population of 150 
students, on the basis of either superior (H-8) or Inferior (N«^^;^rformance 
on two standard i zed tests of spoken English perception cind comp^i^jetision. In 
addition to these native speakers of Japanese, the experiment further Included 
a control group of ten native speakers of English. They were undergraduates 
attending Bryn Mawr and Haverford Colleges. 

/ 

Procedure ' 

The experiment was divided into three stages and employed materials that 
have been described in detail elsewhere (Mann, 1980) j a seven-member synthet- 
ic [da]-Cga] continuum and 12 natural tokens of [alj and Car], Stimuli along 
the Cda]-Cga3 continuum cotaprised three-formant syllables in which systematic 
variations in the onset of the third formant provided critical support for the 
Cd3-[g] distlnWilon. They were constructed so as to be conq^atible with the 
natural tokenrof [al] and [ar]. Those tokens had been extracted from natural 
productions by a male speaker of English of [al-da], [al-ga]. Car-da], and 
[ar-gal. in which the first syllable had been stressed. To control for the 
possibility of material-specific effects, three tokens of each production were 
used. 

4n the first stage of the experiment, isolated stimuli from a^<^ ttn. 
Cda]-rgaj continuum were presented 12 tiroes each, according to a randomized 
sequence. In the second, the Cda]-Cga] stimuli were preceded by the tokens of 
[al] and [ar] and again presented 12 times in each context, according to an 
unblocked randoraixed sequence. In each stage, a 28-1 tern practice sequence of 
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the teat items preceded the test sequence itself, and the task was to mark (on 
• response sheet oonUlning t>oth alphabetic script and Japanese Kana) idwther 
stimius eontalned [da] &r Cgaj. The third and final stage assessed 
subjects' ability to identify [ij and [r] in the stimuli previously employed 
in the second stage of testing, by marking (on a response sheet written in 
alphabetic script) whether a given stimulus contained [al] or [ar]. In light 
Of the potential difficulty of th^is task, listeners were first pret? In- . in 
the appropriate response categories for 28 items, and then given a pVactice 
sequence of 28 items in which they were told the correct response before 
listening to each stimulus. The test sequence then followed, randomized into 
a different order from that employed in the second stage of testing. 



Results 



Figure 1 summarizes the results obtained from the native speakers of En- 
glish, and the Japanese students how were superior and Inferior students of 
spoken English. For convenience, the results obtained in the first stage of 
testing with Isolated [da]-Cga] stimuli are not included in this preliminary 
report, as the various groups did not differ in their perception of these 
sounds, and as the main interest Is in the contrasting effects of Pal] and 
Larj . 

The native speakers of English (Figure 14# were 100% correct in identify- 
ing [alj and [ar], and showed the anticipated context effect of Cl] vs. [r]. 
The Japanese speakers who were superior students of spoken English (Figure lb) 
were 99* correct in identifying CU and [r], which confirms previous indica- 
tions (MacKain, Best, & Strange, 1981) that at least some native speakers of 
Japanese can master the ClMr] distinction. Like the native speakers of En- 
glish, these subjects showed the contrasting effects of [1] am^^[ra-on percep- 
tion of [aa} and Cga]. in contrast to the other two groups of subjects, those 
Japanese subjjects who were inferior students of spoken English (Figure 1c) av- 
eraged only 5W correct identification of [i] and [r], which is not signif- 
icantly better il^an chance. Nonetheless, they showed the contrasting effects 
of [1] and Cr] oh perception of [da] and [go]. Analysis of variance reveals 
significant main effects of stimulus number, F(6,138) - 905.79, p < .0001, and 
context, F(1,23) - 31.93, £ < .00001, and an interaction of these two vari- 
ables, £(6,128) * 130.19, £ < .00001, but not interaction between subject gro- 
up and context. There was also a main effect of subject group, F(2,23) - 
9.58, £ < .0001 , and an in^action involving subject group with stimulus num- 
pfT' £^|2»^3^> 2-l9^p\< .00001, and a small three-way Interaction, 
F(12,138) - 2.19, £ < .015. Bach of these reflects the slightly aberrant be- 
havior of the superior stud/nts of English in. labeling the endpoints of the 
continuum. / 

Discussion 

Thus, all of the subjects perceived some difference between spoken [l] 
and [r], and adjusted their perception of a "following" phoneme accordingly 
whether or not they could phonetically Identify [1] and Cr]. If it is accept- 
ed that the context effect of [1] and [r] is specific to speech perception 
(Hann & Llberman, 1983) and reflects listeners" sensitivity to the acoustic 
consequences of ooartlculation, one implication of the ability of the inferior 
students of spoken English to be sensitive to the effects of [i] and [r] while 
unable to identify them as phonemes,*is that perception of speech errors on at 
least two levels; articulatory and phonological. The artlculatory level Is 
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100 4- 



50 



0--0 preceding [r] 

preceding [I] / Y 

I 




STIMULUS NUMBER 



Figure 1. The oontrasting effects of [i] and [rj on perception of the [d]"Cg3 
diatinctlbn by: a) native speak^s 6f English who are 100$ correct 
' in identifying [1] and Cr); b) native speakers of japanes^ who are 
99$ correct in labeling [1] and ^r], and c) native speakers of 
Japanese «^ p«rfora 4t cfisnoe lev^l in labeling [1] and Cr}. 
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4ireoUy ro^poiwl&le f or tboae wtest effdota and trading reUtiona in apM^ 
.^pePMf tleB tbat tHMt on tisa "^itagration* iAt«rpr«t«tl«n» aotf MMst 
repreaenUtion of inooaing aanaation aa Um produet of tunan rooalixatlon. 
The ability to repradent tpm^Ot aoun«8 at thia level ia independent of nitlTe 
language exparianoet henoe apeakera are aenaitiv^ to the artioulatory p^oper- 
tiea of the liqulda [l] and Cr] whether or not thoae phoneaea are bart of 
their native inventory. Moreover, artioulatory representation nay /precede 
phonetio repreaenUtion, aa listeners nay perceive artioulatory differences 
that they canitot phonnticaUy r^resent. *As for the phone) ogical /level of 
representation, this higher level of speech perception nay pensit thW phonetic 
identification of apeech atiauli aa Cal] or Car], [da] or [ga], and jls availa- 
ble to conociousness. Unlike the art'^iculatory level, however, it defends upon 
language experience; hence listeners nay encounter difficulty when they are 
required to categorize consonants phonetically that are not In thiir native 
inventory. 

In most speech perception experioeots, subjects* responses are guided by 
the phonological level of representatioh. (Responses could also bA Mediated 
by a higher, lexical level of representation; for a discussion, sel Forster, 
1979.) Nonetheless, their behavior in identification and discAlal nation 
experinents has led to the view that speech is perceived as if by reference to 
the irticulatory gestures that convey phonetic segnents. Apparently the 
representation of those artioulatory gestures occurs at a prior level that is 
leas readily available to introspection (although It night become i/ailable 
through training). Were it to intervene dlreetly upon consciousndos, all 
Japanese subjects would be able to draw on their ability to perceive artioula- 
tory differences betvfeen [1] and [r], and thus be capable of distlnAjishing 
Cl] and [r]. f>r » 

The distinction between phonological and artf-lculatory representatulon of 
speech accounts for the ability and the inability of Japanese subjects tp per- 
ceive a difference between [1] and [r], while reinforcing and extending some 
other observations in the speech literature. It is consistent with thd fact 
that the context effect of Cal] and [arj is evident not only when subjects 
were required to label these utterances phonetically (Mann, 1980), but\ also 
when subjects are Instructed to ignore then, as was the case for the nkti^e 
speakers of English In the present experinent. It also accords with evidence 
that subjects are sensitive to the artioulatory properties of vowels thai are 
not part of their native language («Ml»en, 1981). Finally, it can ofnar a 
perspective on the interpretation of findings about the speech perception 
capabilities of infants. Infants have given evidence of perceiving wny 
phonetlcally~rej.evant properties of utterances (see, for a review, Eilers, 
1980; see aleo Kuhl, 1980; and Kuhl & Heltxoff, 1982), as yell as evidence 
of trading relations (Miller g gfaa, 1983). "It "tg cle a r t ha t trffanty fer^ 
celve humn speech in a special way, perhaps owin»to proclivities of the l^t 
or doninant heaiaphere (MacKain, Stud^tHCenned^^ Spieker, 1p Stern, 1983), 
which mediates speeoh perception in sjlulta (StuddertHCenneidy & Shankveiler, 
1970). At present, in the absence of any aeans of verifying that infants per- 
ceive phoneaas, as, ifu^l?. It Is preaature po accept a conclusion that they are 
^»ipable of phonologiciil representation. tet the data mirely l^ly that 
infanu poasess soae perceptual abHitles that are the basis of adult phonetic 
psrceptlon (Miller k Eiaas, 1963!^/ One of these could well be the ability to 
fora artioulatory representatioiis of incoming speech stimuli, regardless of 
specific langua^ experioice. . ^- 

/ 
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Bruce Kayttt 

t 

I 

Abstract. The departure point of the present paper is our effort to 
characterize and understand the spatioteoqjoral structure of artlcu- 
latory patterns in speech. To do^, we removed segmental variation 
as awch as possi4>le while retaining the spoken act's stress and 
prosodic structure. Subjects produced two sentences from the "Rain- 
t>ow Passage" using reiterant speedh in which normal syllables were 
replaced by /ba/ or /ma/. This task was performed at two 
self-selected rates, conversational and fast. Infrared LEDa were 
placed on the Jaw and lips and monitored using a Modified SELSPOT 
optical tracking system. As expected, when pauses marking major 
syntactic boundaries were removed, a high degree of rhythraicity 
within rate was observed, characterized by well-defined 
periodicities and small coefficients of variation, vfhen articulato- 
ry gestures were examined geometrically on the phase plane, the tra- 
jectories revealed a scaling relation between a gesture's * peak 
velocity and displacement*. Further quantitative analysis of articu- 
lator movement as a function of stress and speaking rate was indica- 
tive of a language-modulated dynamical system with linear stiffness 
and equilibrium (or rest) position as key control parameters. 
Preliminary modeling was consonant with this dynamical perspective 
which, importantly, does not require that time per se be a con- 
trolled variable. 

It has often been supposed that temporal organization in biological sys- 
tems is ultimately governed by neural rhythm generators, biological clocks, 
metronomes, etc;-, Physiologists and psychologists, confronted with order in 
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tlie tioe domain, have not hesitated to posit clMks whose "ticks'* define^when 
m^lfi mil activate (e.g«» Kikhevnikov & Chis\ovioh, 1965 1 fioseabam ft P*- 
tashnik, 1900^ Our approach, however, be*n\dlrected towards identifying 
and understanding spatioteaporal pattern In ar^ulatory events as a dynaaio 
propsrt^of natural systeas rather than as the result of the operation of soae 
special neural or mental time-keeping device (cf. Kelso, Holt, Rftin, A 
Kugler, 1981). Once elahorated, we believe this dynamical perspective nay af- 
ford a principled ♦ccount of the ubiquity of teaporal constraints In movement 
in general and in speech in particular. For exa^le, the internal phasing re- 
lations among muscles and kinematic components in rhythmic activities such as 
locomotion, scratching, respiration, and Bkastlcation are pres«*ved across sca- 
lar changes in force and rate (cf. Kelso, 1981; Grillner, 1982, for reviews). 
Similarly, in electromyographic and kinematic work on speech (TuJ^^er, Kelso, & 
Harris, 1982, 1983s Tuller & Kelso, 198^1), timing of consonant production 
relative to vowel production was found to be invariant over substantial 
conges (induced by stress and rate) in the duration of the vocalic cycle. 
These data— along with other evidence (reviewed by Fowler, 1983)— suggest a 
, vowel-to-vowei organization that places constraints on speech timing. 

Although speech certainly involves many of the same ">>«dy parts as chew- 
ing, its rhythmic basis id not clear, in spite of the fact that linguists and 
others have long claimed speech to be rhythmic, and people perceive it to be 
so (e.g., Lehiste, 1972; Lenneberg, 1967; Lisker, 1975; Pike, 19^5). Yet 
experimenters, have had enormous difficulty identifying rhythmicity in either 
the articulatory or the acoustic domain. One possible reason — as pointed oi*t 
by Fowler (1983) with respect to acoustic studlesr-is that exp«rimental meas- 
urements typically used may be inappropriate for capturing the imural, ten^o- 
ral structure of spoken sequences. Speaking Is an inherently multidimensional 
process; during speech different articulators are involved to different de- 
grees and the spatlotemporal overlap am<Mig ioveroents is considerable. Con- 
fronted with so many simultaneous or nearly sinjultaneous events, there seems 
little chance of our identifying any basic tempcral regularity, even though 
our perceptual impressions lead us to suppose that one exists. 

Our approach in the present work was to strip away, as nuoh as possible , 
the Influence of segmental variation on articulatory movement, by .asking sut- 
Jects to speak "relterantly." That is, speakers Substituted the syllable /ba/ 
or /ma/ for each real syllable in the utterance, while nimlcking the utter- 
ance's normal prosodlc stru^re. The benefit of the reiterant technique is 
that, by minimizing segmental variability while preserving the prosodlc pat- 
tern (Llberman & Streeter, 1978; NakatanL, 1977), we are able to measure the 
movements of articulators (in this case tfle lips and Jaw) that are consistent- 
ly Involved in the production of /ba/ and /ma/. In principle, this procedure 
affords an analysis of articulator patterns in a simple and_acceaaiblft ?orm. 

We recognize that the relationship betweerv real speech and reiterant 
speech is not always transparent, ire should stress, however, that the main 
thrust of the present work is to use reiterant speech as a tool to examine 
articulator^, motions in a speechlike task. We do pot claim any necessahy gen- 
eralization to real speech although one might exist (see also Larkey, 1983). 
For instance, Llberman and Streeter (1978) show the pattern of acoi^t.lc syll- 
able durations to be similar between real speech an^ skilled reiterant speech 
although the absolute durational values are very different. In terms of 
produotion, it seems unlikely to us that the control of the lip-Jaw system for 
the production of a reiterant /ba/ Is funclaa^taUy different when Oie sane 
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^Xlabia ia protluoed during natural apaaoft. Indaad, we shall dasorlba quantl- 
Jativaiy cartain kliMmatio ralationahipa (a.g., iMtwaan an ai»tlculator*a peak 
velocity and diaplaceaent) that have been observed in aany other nonrelterant 
speech production studies. 

In the present paper, we outline a geometric approach for characterizing 
the dynamic properties underlying articulatory moveoents during reiterant 
apeech. He use the phase portrait to facilitate the analysis of relevant 
articulatory variables when speakers produce these siaple sequences of syll- 
ables. TO our knowledge, phase portrait techniques have rarely been employed 
In speech production studies, even though their role is to describe the forms 
of motion in complex, Bultidegree-of-freedom systems Ccf. Abraham & Shaw, 
1982Ti rwere one to count the neurons, muscles, and Joints that cooperate to 
produce even a simple utterance, literally thousands of such elements would be 
Involved. Yet normal speech is usually coherent and organized: A low dimen- 
sional pattern emerges from a system of high dimensionality that can be con- 
trolled with relatively few dynamic parameters.' Thus our approach is one in 
which we attempt to characterize regularities of articulator pattern In terms 
of a relatively abstract functional organization (cf, Kelso h Tuller, I98i|a). 
We do not attempt to model peripheral biomechanics or neurc^Jhyslologlcal mech- 
anisms. Rather we use the phase portrait as a way of uncovering qualitatively 
the system's control structure and as a preface to a quantitative treatment of 
articulatory trajectories. In doing so we observe both invariant and 
systematically varying features of motion when stress and speaking rate are 
changed. Perhaps most Important, our results, analyzed geometrically and 
Interpreted from a dynamic perspective, do not require ;the assumption -.that 
time itself is a controlled variable: instead, the form Of articulator tra- 
jectories over time Is seen as a consequence of s^. control striwtulre whose dy- 
namic parameters are functionally equivalent to those of a mechanical 
mass-spring system, namely: equilibrium (or rest) position, which is the 
position at whtch the net force on the mass is zero; and linear stJffness, 
which is the reactive force per unit displacement. 

I, Methods and Procedures 

Two adult speakers (one male [SK. the first author and a native speaker 
of an Ulster dialedt of English], and one female CDW, a speaker of a New Jer- 
sey dialect of kmeh'ptn English]) recited the first and last sentences of the 
'♦Rainbow Passage" ;/ Cfl ) "when the sunlight strikes raindrops In the air, they 
act like a prism ^d form a rainbow," and (2) "There is, according to legend, 
a boiling pot of gold at one end." After reciting each sentence, speakers 
mimicked the prosodic pattern Z-U times, substituting only /ba/ or only /ma/ 
for each syllable. So, for example, "When the sunlight strikes raindrops in 
the a|r" HQUid lie mimicked as-"Jba- ba- ba ba ;ba ba ba bQ {where tmderl tir- 

ing indicates a hypothetical stress pattern for the syllables). Upon comple- 
tion of the task at a normal, conversational rate. It was then repeated at a 
faster rate. One of the speakers (SK) repeated this procedure at a later 
date. Ih all, 392 syllables at each rate were analyzed. We also obtained 
measures of each speaker's preferred frequency of Jaw movement over an extend- 
ed perl'od of time, by asking the subject to "wag" the Jaw at a comfortable am- 
plitude and frequency "as if you were going to do it all day." "Wagging" 
movements were then sampled over a 30-s interval. 
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For ^e^ch and non^eech taskSi vertical diaplaceaents of the lips and 
iaw iMNra tnaokad using a device aXallar in prinoipla to tAe coiwroially 
available SELiSPOT systesp which employs infrared LEDs that can be placed 
aidsaglttally on the noae^ lips, and point of the chin. Modulated light from 
the diodee is captured by a camera equipped vrith a Schottky planar diode 
located in its focal plane. The output of the photodiode is fed to associated 
ej^otronics that decode the signals and cofi^ute pairs of x and )^ coordinates* 
Up to eight channels of coordinate potentials may be generated sltmiltaneouslyi 
each with a bandwidth of *0-*500 Hz. These potentials are then fed to 
first^stage DC offset preamplif lers, which center the signals about the zero 
DC level. Following the offset adjustsent, the coordinate values are 
transmitted via DC coupled amplifiers, checked by means of a monitoring 
oscilloscopep and recorded. Once the subject was seated with the LEDs in 
place, calibration mis achieved'Jby raising the camera a known distance (2 cm) 
^d recording the output of the lower lip LED. Simultaneous acoustic }*ecord- 
ings were also made. The movement data were rteorded on FM tape and ^mpled 
at 200 Hz in later {Computer analysis. This Included numerical smoothing (us-- 
ing a 25"n»s triangular window), and differentiation (using a two-point central 
difference algorithm; james, Smith, & Wolford, 1977) for obtaining th* 
derivatives of ^tlon (velocity, acceleration). 

Figure 1 shows an example of the position and velocity of the lower lip 
and Jaw (I.e., the LEDs attached to lower lip and Jaw) for the first part of 
sentence 1, ••When the sunlight strikes raindrops In the air,** >^ere /ba/ is 
the reiterated syllablo. In the movement traces, peaks and valleys denote the 
high and low vertical positions achieved by the indicated articulators. Thus 
peaks occur during lip closure for the bilabial stop and valleys occur during 
production of the low vow^l /a/. In the velocity traces, peaks and valleys 
are the maxlimm velocities attained going into and out of a closure, 
respectively. The peaks and valleys were determined by a computer program 
which 'also calculated means (M) and standard deviations (SD) for peak-^to-peak 
cycle duration and displacement and di^ration ofj^^openlng (peak-to-^valley ) and 
closing (valley-to-peak) gestures. 

II. Results and Discussion 

Each of the following sections is designed to be self-contained in that a 
discussion accompanies each set of empirical findings. First wc present data 
pertaining to the global tecMporal regularity of articulator movement that was 
observed in the experiments. Second, a qualitative dynamic analysis of 
artlculatory motion Is presented using the phase portrait to describe the 
forms of motion that are produced. Following Is a quantitative kinematic 
analysis of motion and its derivatives that details effects of the local 
changes Induced by stress and speaking rate transformations. We try to main- 
tain continuity of presentation In thls^^^jjpuantltatlve section by proceeding 
from lower-order to higher-order kinematic relations. Finally we present some 
of out preliminary efforta to model the present artlculatory findings using an 
approach based in dynamical systems theory and supported by recent results in 
the field of physiological motor control. 

A. Global Teiqporal Regularity 

First we show separately for the two rates and two relterant syllables 
the mean duration between successive peaks and the associated standard devia- 
tions* Ttid values shown in Table 1 are averaged across subjects and sentences 
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Figure 1, Position and velocity over time of lower lip and Jaw LEDs for the 
reiterant production of •♦When the sunlight strikes raindrops in the 
air," /ba/ is the reiterant syllable. 
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for both Jaw and loupr lip ootions (i.e., notlbns of the*.^«w and lower lip 
US>o). In order to study artlculatory BK>tlon|B jger se, we have renoved 
tntervals that span major syntactic breaks and the first and last syllables of 
the sentence, i.e., where startup, pauses, and lengthening effects 
predominate. 



Table 1 | 

MeanSvand standard deviations of peak-to-peak duration in ms and frequency (f) 
in H2 for Jaw (j) and lower lip (LL) during reiterant speech at two rates. 
Between-subjeot standard deviations are in parentheses. 

/ba/ /roa/ OVERALL 

J LL J LL J LL 

NORMAL 

m 213 (5) 212 (8) 212 (3) 211 (3) 212 (JO 211 (6) 

sd H2 (6) m (6) i^l (1) 37 (.1) 42 (4) 39 (5) 

f 4.70 (.11) 4.72 (.19) 4.72 (.06) 4.73 (.06) 4.72 ( .OsK 4.73 ( . 1 4) 



FAST 



m 168 (5) 168 (5) 166 (3) 16^> (4) i67 (4) 167 (4) 

sd 33 (9) 29 (8) ' 29 (3) 30 (5) 31 (7) 30 (6) 

f 5.95 (.17; 5.95 (.18) 6.03 (.11) 6.06 (.15) 5.98 (.14) 6.00 (.16) 

n 512 512 272 272 784 784 



The durational data shew quite low variability regardless of rate, 
coefficients of variation in the ]0% to 20% range. The two /speakers ar.^ aia 
very similar in their durational behavior as revealed in the smal^. b,' 
tween-subject standard deviation of the means. Mean cycle durations for the 
three experimental sessions were 211 ms (approximately 5 Hz) for the normal 
rate and 167 ms (approxlmatdi^^6 Hz) for the fast rate. In this case, the Jaw 
exhibits a periodicity similar to that of the lower lip. Not surprisingly, 
the data contrast with those of Ohala's (1975) earlier study in which 10,000 
consecutive Jaw opening gestures were obtained during a 1,5-h reading period. 
Ohala^ (1975) found large durational variance (presumably because of the pres- 
etted of pauses and segmental factors) accompanied by a donrtnant, but weakly 
defined, periodicity of about 250 ms (4 Hz). Ohala and others (e.g., Lind- 
bloffl^ 1983) hav© suggested that this periodicity may correspond to the "pre- 
ferred frequency of the mandible." However, the preferred wagging frequencies 
of Jaw movement for our two speakers (0.81 Hz and 2.04 Hz» SDs » 0.06 and 0.21 
Kz, re»pectlv«ly > are much slower than the frequencies found either by us for 
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rei tenant speech or by Ohala for read apeech. It is dear then, that neither 
the sharply defined periodicity observed by us in reiterant speech nor tho» 
tfeakly defined cycling found by Ohala in read speech is the sane as the pre- 
ferred frequency of the oandible in our nonape^ task (see also Nelaon, Ferk- 
ell, & Westbury, 198^, for differences between preferred frequencies of mandi- 
ble •ovement in speechlike and nonspeech tasks). We also found that the 
periodicity was unaffected by the syllable that was used to mimic real speech. 
The largest oeftn durational difference regardless of rate condition between 
/ba/ and /ma/ for any articulator was 3 ms (see Table I). In short, when seg- 
mental variation is alnlmized, it is possible to identify a relatively stably 
articulatory periodicity. The periodicity is not perfectly Isochronous be- 
cause there are systematic variations concomitant with stress and rate (see 
Section lie). 

B. A Geometric (Qualitative Dynamic) Analysis* 

In the following geometric analysis, phase plane trajectories are 
generated by continuously pld|tlng the relationship between, in this case, 
articulator position, x. and its derivative, velocity, x. As an example, 
consider the idealized case shown in Figure 2. The upper trace Is a computer 
generated slnewave of 5 Hz with a peak-to-valley displacement defined to be 20 
mm. The peak position corresponds to the consonant closure, and the valley 
position to the maximum opening for the vowel. Points of maximum downward 
(opening) and upward (closing) velocity fall at the midpoints of the position 
trace. To create a phase plane trajectory shown on the lower part of Figure 
2, we plot successive position points and their corresponding velocities as 
coordinates on a plane whose vertical axis denotes position and whose horizon- 
tal axis denotes velocity,' The arrowheads on the circle denote the direction 
of motion on the plane. Thus one cycle or orbit corresponds to the interval 
between successive closures, with the opening gesture on the left half and the 
closing gesture on the right. Note that time itself is not an explicit vari- 
able in this description . " 

Figure 3 shows phase plane trajectories for the Jaw and lower lip LEDs of 
"When the sunlight strikes raindrops in the air," using reiterant /ba/ spoken 
at a normal rate. Qualitatively, the shapes of the trajectories are quite 
similar across the ten syllables plotted. There is a strong tendency, for 
example, for displacement and peak velocity to covary directly (see Section 
IIC). Normal and fast reiterant productions for subjects SK and DW of the 
second part of the first sentence, "they act like a prism and form a rainbow," 
are shown in Figures H and 5. The mutual relationship between the kinematic 
variables of position and velocity Is accentuated by the rate manipulation, 
particularly for subject SK. Once again, even when there is a clear distinc- 
tion between the trajectories corresponding to stressed and unstressed ayll- 
ables, their orbital shapes are generally similar. The unstressed (sometimes 
reduced) syllables are characterized by smaller displacements and peak 
velocities than the stressed syllables, thus maintaining a global similarity 
of (elliptical) trajectory shape across unstressed and stressed gestures. Al- 
so observed, however, are subtle differences between trajectory shapes 
associated with different gestural displacements. For example, the orbits ap- 
pear to be slightly more compressed horizontally for larger displacement ges- 
tures relative to shorter displacement gestures. in Section lie, we will 
quantify both the global similarities and subtle differences among gestural 
trajectory shapes. 
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Figure 2. Top. Idealized position and velocity over time of articulator 
movement, Bottom, Corresponding phaso plane trajectories. 
Abscissa is velocity, ordinate is position (see text for details). 
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REITERANT SPEECH /ba/ 




Figure 3. Left. Position and velocity over time of Jaw and lower lip LEDs 
for sentence produced with reiterant /ba/ at a normal rate. Right. 
Corresponding phase plane trajectories. 
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Phase plane trajectories of lower lip sKitlons for the second part 
of sentence 1, "They act like a prism and form a rainbow" produced 
at normal and fast speaking- •rates with /ba/ as the reiterant syll- 
able. Subject is SK. 
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Figure 5. Phase plane trajectories of lexer Hp notions for the second part 
of sentence 1, "They act like a prism and fonii a^ralnbow" produced 
• at normal and fast speaking rates with /ba/ as the relterant syll- 
able. Subject Is OW. 
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C. Quantitative Kinenatic Analysis 

* In this section we quantify specific effects of speaking rate and stress 
on artioulatory aoveoents in an effort to answer the following questions. 
First, what klnenatie variables or relations aoong variaDies sight infora us 
about the control of speech gestures? Second, what kind of regularity, if 
any, exists in the notions of speech articulators across changes in str»s and 
speaking rate, and how night such regularity be rationalized? Although we 
appreciate that there are many idiosyncratic differences aaong speakers, 
dialects, and languages, our ea^hasis here is on identifying what is coorion 
across such diversity. In short, can we begin to define a "deep structure" 
for ^eech sKJtor control that can be recognized in the face of ouch surface 
variability, and. If so, on what principle(s) is it based? 

♦ 

*te ^egin with an analysis of the space-tiiie characteristics of articula- 
tor aovenent and Its derivatives, '^Ith the enphasls now on the gesture (open- 
ing and closing) rather than the cycle. Because of the enoraous amount of 
kinematic data Involved, we restrict our concerns (unless otherwise IndicatedT" 
to (a) the motions of the Jaw and lower lip complex for the syllable /ba/ dur- 
ing relterant speech, and (b) the single experimental' session for each speak- 
er, i.e., omitting the repeated session. This amounts to 232 gestures for 
speaker DW (116 <^enlng and 116 closing) anja gestures for speaker SK (232 
opening and 232 closing). 

The general statistical analysis of the kinematic variables takes the 
form of a gesture (opening, closing) X stress (stressed, unstressed) X rate 
(normal, fast) analysis of variance^ for each dependent varl&ble, followed by 
correlational analysis between variables (e.g., displacement versus time) 
where appropriate. In order to facilitate communication of the results we re- 
port the degrees of freedom for the statlstlcal^saln effects and Interactions 
only once. For subject DW the numerator and denominator degrees of freedom 
are 1 and 224; for subject SK they are 1 and i456. 

^ • Displacement, movement tlme^ and their relation . 

Tables 2 and 3, provide the mean displacement and mean movement times of 
the opening and closing gestures for the syllable /ba/, as i function of 
speaking rate and strdss. The mean data order systematically for both 
kinematic variables In both subjects, although the magnitude of change across 
rate and streaa 1st idiosyncratic. Si ^llar results have been reported by oth- 
ers (e.g., Kuehn & Moil, 1976; Tuller, Harris, & Kelso, 1982b)., For dis- 
placement, since the lips always return to closure, the main effect of g^esture 
type (opening versus closing) was not significant In either subject's data; 
Fs - 0.10 and 0.55, £S > 0.05 for DW and SK, respectively. Nor were there 
two- or three-way interactions with gesture type. Stressed gestures had 
larger displacements than unstressed gestures; Fs - 39.19 (DW) and 
(SK), £S < 0.0001. Rate had a generally slmllar^effect: Normal rate gestures 
were produced with larger displacements than fast gestures, Fs «• 11.26 (DW) 
and 136.18 (SK), £S < 0.001. Unlike DW, subject SK revealed a stress X rate 
interaction on the displacement measure, F - 35. '♦I, £ < O.OOOI. A simple main 
effects «iaiysls of this Interaction was^entlrely consonant with the main ef- 
feotm^ however; The difference in displacement as a function of rate ft9s more 
apparent In unstressed gestures, F - 162.92, £ < o.oool , than stressed ges- 
tures, F - 8.70, £ < 0,00«. Similarly, differences In displacement as a func- 
tion of stress were manifest particularly at a fast peaking rate, F « 346.77, 

136 

144 



Kelao ©t al.: A Qualitative Dy nasi c Analysis of Be iterafc Speech 



f*a^ 



Table 2 



KiiMHitio valttev of dlaplaeeaent , ti«e, and peak velocity across rate and 
stress variations (^ening gestures, /ba/) 

Stressed Unstressed 



Fast 







d» 


t 




d 


t 


VP 


DW 


H 


1«».58 


123.9 


229.2 


1 1 .80 

» lew 


• 112 k 






SD 


3.68 


20.6 


71 ^ 








n 




21 






34 




SK 


H 


16.02 


I'lO.'l 


2°62.5 


12.63 


111.7 


238.6 




|d 


l.iiO 


19.0 


21.9 


3.35 


30.1 


55.8 




n 








68 






DW 


M 


13.^1 


103.3 


2^41 .0 


10.38 


85.3 


216.3 




SD 


2.83 


12.7 


H8,8 


3.27 


11.3 


63.6 




n 


24 






31' 




SK 
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11.85 


120.0 


2^1 .1 


, 8.27 


81 .3 


170.2 




SD 


1.^6 


17.9 


32.9 


3.85 


20.6 


61.1 




n 
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'Displacement (d) in inn; Time (t) in ms; Peak Velocity (Vp) in rom/s 

Table 3 

Kinematic values ,of displacement, time, and peak velocity across rate and 
stress transformations (closing gestures, /ba/) 

Stressed Unstressed 
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11.9 
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SK 
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f< O.OOOtt although they were highly significant at the nomal rate aa well» 
^ •* t^«ff» £ <. 0«000r (see Tables 2 and 3). No other interactions were sig--^ 
nifioant for either subject* 

For aoveawnt tiae» opening gestures as a class took longer than closing 
gestures. All the aoveynt tine values for siailar conditions reported in 
Table 2 (opening) are greater than those reported in Table 3 (closing), a 
finding substantiated °by a significant gesture oain effect for DW, F • 171.^3* 
£ < 0.0001 and SK, F » 2M0.57, £ < 0.0001. Stressed. gestures take longer than 
unstressed gestures, Fs - 20.21 and 223.62, £S < 0.0001 for DW and SK, 
respectively. For subject DW a sinple wain effects ahalysis of the sign if " 
leant gesture X stress interaction, F • ^.31, £ < 0.0^ revealed that the 
stress effect was greatest for opening gestures, F <- 21.58, £ < 0.001 (coflq>are<- 
Tables k and 3). For subject SK, the gesture x" stress interaction was also 
significant, F •* 10.31, £ < 0.002: The difference in asovenent tioe between 
stressed and "uns tressed conditions was greater for opening gestures, F - 
165.08, £ < 0.0001, than closing gestures, F - 68.89, £ < 0.0001. " 

Speaking rate had a systematic effect on moveoent tiae. Gestures pro- 
duced at abnormal rate took longer than those at a faster rate, F - 104.50 
(DW) and F -181.84 (SK), ps < 0.0001. For subject SK, there was also a ges- 
ture X rate InteraiOT^; £ - 6,60, £ < 0.02. Again, the rate effect between 
gestuneis was a aiatter of degree; movement time differences between rates were 
more \|pparent In opening gestures, F - lib, 86, £ < 0.001 than closing ges- 
tures T F - 59.98, £ < 0.0001, although clearly the effect was highly signif- 
icant in both gesture types. 

In summary. In both subjects, the main effects of stress and rate 
predominate for both displacement and movement time as dependent measures, al- 
though these effects tend to be greater in opening gestures than closing ges- 
tures. Generally speaking, stressed gestures display greater articulatory 
displacement and longer duration than unstressed gestures. Rate has similar 
effects « Gestures produced at faster speaking rates are accomplished with 
smaller displacements and in shorten^ movement times than those at a normal 
oo;iversational pace. 

Viewed from an overall -perspective based on the mean data of each sub- 
ject, we can make a rather simple statement regarding the displacement-time 
relation independent of ^veoent phase (opening versus closing), rate, or 
stress. Namely, on the average, displaceaient covaries directly with duration . 
Smaller (larger) displacements tend to be observed at fast (normal) rates and 
in unstressed (stressed) environments; duration of motion adjusts In a corre- 
sponding fashion. 

These overall effects, therefore, suggest a systematic and apparently 
quite linear relationship between spatial and temporal dependent measures. 
'However, examination of the scatter plots for each subject in Figure 6 (<H>«n- 
Ing phase) and Figure 7 (closing phase) reveals a somewhat more complicated 
picture. For subject SK the data follow the general picture outlined above; 
amplitude and duration vary in a quite linear way. The overall correlation 
for opening gestures is r - 0.82 and for closing gestures r - 0.76 (£S < 
0.01). Moreover, the displacement-time correlations for Individual condi- 
tions, shown in Table H, are, with a single exception, significant (ps < 
0.05). 
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Figure 6. Scatter plot of amplitude and duration of each suDject's lower lip 
motions for opening gestures associated with the consonant-vowel 
(CV) portion of the syllable. Points are differentiated by rate 
and stress I as shown in legends 
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Figure 7, Scatter plot of amplitude and duration of each subject's lower lip 
/motions for closing gestures associated with the vowel-consonant 
portion (VC) of the syllable. Points are differentiated by rate 
and atress, as shown in legend. 
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Tabl« 4 ' 

Linear correlations (r) and regression slopes (m) of dlsplacefflent-time rela- 
tionship across rate and stress transfonnations (/ba/). 

A. Opening Gestures 











Unstressed 
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DW 
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B. Closing Gestures 
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Unstressed 
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Normal 


DW 


.02 


-.11 
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.31 




SK 


.03 


«60» 


.12 


• .65* 


Fast 


DW 


.02 


.23 


.02 


.08 




SK 


.06 


.50» 


.15 


.52* 



•£ < .09 



The picture is rather different for subject DW, however, in her data the 
Individual dlsplaceoent-time pairs are widely distributed and In only one out 
of a possible eight conditions (unstressed opening gestures produced at a nor- 
mal rate) is there a significant correlation (see Table H). When- opening and 
closing gestures are analyzed as a group for [DW, significant correlations are 
obtained, rs - 0.16 and 0.26 (£3 < o.05), respectively, although the propor- 
tion of variance accounted for is small, I 

To summarize,, the coupling between displacement and time, is quite differ- 
ent for the two subjects. One subject (SK) reveals a rather orderly relation 
between these variables across rate, stress, and rooveraent pha»e (opening 
vs. closing). The other subject (DW) shows a high degree of overlap among 
conditions and a much more homogeneous distribution of displacement-tiae data 
pairs. Indeed, the proportion of variance accounted for by this relationship 
is so small as to suggest that, for DW, displacement and time arc essentially 
independent. 

How might thesfe apparent discrepancies between subjects in the displace- 
ment-time performance space be interpreted? One account that merits mention 
is that the speech motor system adheres to a ir1^|^s coat function such as 
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naaat effort," which night give rise to tradeoffs in articolatory displace- 
amt.aod duration. This notion of noveaent cosU is elaborated in soae deUil 
in « reeent paper ^ Kelson (19d3)' and has been applied to an analysis of jaw 
■oveaenu in repetitive speech and nonspeech gestuffeT^ttlelson et al., 198J1). 
The key idea is that articulatory Boveaents during speech are accomplishing 
systea "goals" in the ^lysically most econooical fashion, i.e., according to 
•oae "ease of aoveaeivt" criteria (see also Lindblon, 1983), which in turn 
ii^oees boundary constraints on speech aotor prograaeing (Helson, 1983). Such 
criteria aay be aet by ainiaizing a nuaber of possible articulatory cost indi- 
ces such as "effort" (proportional to peak velocity, which bears a direct re- 
lation to the iapulse or integral of the force^tiae curve for a given aK>ve-* 
aen() or "jerk" (the first derivative of acceleration). Nelson (1983) shows 
that although a wide variety of "aoveaent ease" cost functions aay be a'ini- 
aized, the dlspiaceaent-duratlon relation reaains roughly the same. Thus a 
coaaon feature of all suoh functions is that "cost" Increases (on whatever di~ 
aension) are associated' with moving a given distance in less time or aovlng a 
greater distance within a given time. To do either requires an increase In 
peak velocity, acceleration, jerk, etc. (see also Hogan, 198*1). 

In the displacement-time space a relationship, such as that displayed by 
subject SK In Figures 6 and 7 is suggestive of a fairly constant articulatory 
cost (cf. Nelson, 1983r Figure 5). Thus It could be argued that gestures of 
short amplitude and duration (e.g., fast unstressed gestures) do not 
necessarily cost the system any more than larger amplitude movements of great- 
er duration (corresponding, say, to noraal stressed gestures). Distance and 
time mutually adapt to the linguistic requirements of the activity in such a 
way as to preserve a relatively constant cost. 

A problem, however, with this analysis of "economy of effort" in speech 
is that it appears to pertain, at best, to only one of our subjects and to on- 
ly one of the three subjects in the Nelson et al. (198^4) study. Several pos- 
sibilities could account for such a state of affairs. One is that it could 
reflect differences in the skill level of producing relterant speech. That 
Is, the less constrained, more variable relation between displacement and time 
in subject DW suggests that her* mode of motor control is not following a 
strategy of minimum cost. DW may, In fact, have to discover exactly what that 
strategy is. It is well appreciated In the literature (e.g., Urkey, 1983) 
that relterant speech is itself a skill, and it was certainly our impression 
that subject DW was not as skilled at "eonvertlng" real speech into relterant 
speech as was subject SK, How cost functions change with increasing skill is 
a topic open to ouoh further research. 

Given that the displacement-time relation is not consistant between sub-r 
jects in the present study or in t>i© literature in general (see Nelson- et al., 
198'4; Parush, Ostry, & Munhail, 1983; TuUer et al., 1982b), the question 
is; Are theme other observables that ralgrvt^ford insight into the similarity 
among subjects in this task? Are subjects really as different in performing 
relterant speech as the dlsplaceroent-tloe distributions suggest? As we shall 
see, examination of the higher derivatives of motion not only affords a window 
into the nature of the system's und^ylng dynamic organization, but also sug- 
gests that the differences between subjects night be due to the surface nature 
of the dlsplaceioent-stlroe description. 




Kelso tt al.j A QualiUtlve Dynaalc Analysis of Relterant Speech 



2* velocity and the peak velocity displacement relation 

The phase plane data dlacuased In Section IIB reveal at least two 
Interesting features about a given gesture's velocity pattern that aerit 
further Quantification. First, the patterns are largely unimodal (see Figures 
3.. fl» and 5) in that both opening and closing gestures possess single velocity 
peaks. Related to this, peak velocity (Vp) bears a direct relationship^ to to- 
tal Impulse (i.e., the Integral of the force magnitude as a function of time) 
and thus can usefully be used to index the "effort- underlying the movement 
(e.g.. Nelson, 1983; Schmidt, Zelaznik, Hawkins, Frank, & Qulnn 1979) 
Since variables like stress have been associated with articulatory effort 
(e.g., Ohman*s, 1967, stress pulse theory) it is of interest to quantitatively 
assess if and how peak velocity changes with gesture type,' rate, and stress 
conditions. Second, and perhaps more important, is the apparent regulari- 
ty—evident on the phase plane— in the covariation between a gesture's peak 
velocity (Vp) and its displacement (d). We consider first the statistical ef- 
fects on peak velocity itself; then we evaluate and interpret the relation-' 
ship between pfeak velocity and displacement. _ 

A cursory look at Tables 2 and 3 Indicates that Vp, like displacement and 
movement time, varies systematically with stress and rate, although in some- 
what idiosyncratic ways. The gesture type main effect Is significant for both 
subjects, Fs - 59.08 and 111.01, jgs < 0.0001, for DW and SK, respectively. 
For similar conditions, all the Vp values in Table 3 (closing) are greater 
than Table 2 (opening). Stress had predictable effects on peak velocity re- 
gardless of gesture type. As In the recent results of Stone (1981) on Jaw 
toove;nent, and Ostry, Keller, and Parush (1983) on tongue dorsum movement, 
stressed gestures are produced with higher peak velocities than unstressed 
Ifestures, F - I5.O3, p < 0.002 and F « 201. US, p < 0.0001, for DW and SK 
respectively.. . ~ ~ ' 

As others have found, however, the effect of speaking rate on the Vp 
measure was not so . consistent across subjects (e.g.. Abbs, 1973; Kuehn & 
Moll, 1976;Ostry et al., ,1983, Tuller, Harris, & Kelso, 1982b). For subject 
DW, peak velocity was greater for the faster speaking rate, F - ^\.9^, p < 
0.03. For SK, the opposite occurred (see Tables 2 and 3), F~*- 9^.^41 ', p < 
0.0001. Moreover, there was a stress X rate interaction for SK, F - \o 06 d 
< 0.0001 but not DW, F - O.Sfe, £ > 0.05. ForSK. although stressed gesture? 
are always produced more rapidly than unstressed gestures in both speakinK 
rates, F - 30.93, p < 0.0001 (normal) and F - 210.51, p < 0.0001 (fast) only 
unstressed gestures differentiate between normal and fait speaking rates For 
subject SK's unstressed gestures, normal speaking rates have higher Vp than 
,fast speaking rates, F - 132.50, p < .0001. For stressed gestures, no signif- 
icant differences in Vp occur between speaking rates (see Tables 2 and 3) F - 
1 .97, > 0.05. . _ 

Because stress has Very systematic effects on a variety of variables 
(including not only the kinematics reported- here, but EMC as well, e.g. Tull- 
er, Harris, & Kelso, 1982a) and the effects of rate are less systematlc'across 
subjects (particularly for Vp), it can be argued that stress and rate are 
qualitatively different kinds o'f articulatory transformations (see Tuller et 
al., 1982a, for review). However, the differences observed between stress and 
rate remain puzzling at least when viewed on single dimensions (e.g., EMG am- 
plitude, duration, and articulator velocity), and further work is necessary to 
establish the validity of this claifli. One potential issue— yet to be fully 
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explored-'is that the subject is usually free to vary the elected rate whereas 
9lrB35 constraints are more clearly defined. Systematic control of speaking 
rate nay prove useful and enlightening. 

The linkage between peak velocity and displacement, however, is less 
ambiguous. This finding in itself is not new; it has been reported before in 
other studies of articulation, often ai an incidental result (e.g., Kent & 
Moll, 1972; Kozhevnikov & Chistovich, 1965; Kiritani, Imagawa, Takahashi, 
Masaki, 4 Shirai, 1982; Kuehn & Moll, 1976; Ohala, Hikl, Hubler, & Harshman, 
1968; NacNeilage, 1970; Perkell, 1969; Sussraan, MacNeilage, & Hanson, 
1973). The particular articulator involved does not appear to be a factor; 
the relationship exists in movements at the supraiaryngeal level including the 
tongue dorsum, tongue tip, lips, and jaw. More recently it has been observed 
in both abduction and adduction of the vocal folds (Munhall & Ostry, 198^). 

Quantitative analysis of the present data reveals that Vp and d are high- 
ly correlated for opening ^nd closing gestures in both subjects. For subject 
SK the correlations, collapsed across stress and rate, are 0.87 for the open- 
ing phase and 0.9^ for the closing phase. For subject DW the correlations are 
0.81 and 0.76 for opening and closing gestures, respectively (£s < 0.01). 
Cc^pared to the displacement-time relationship, which was very different be- 
tween subjects (cf. Figures 6 and 7), the scatter plots displayed in Figures 8 
and 9 for opening and closing gestures, respectively, show a i»u«^^eater de- 
gree of overall similarity between subjects in both phases of njotion. 

The high linearity, of course, is a reflection of the overall temporal 
stability present in the opening ani closing phases of the articulatory move- 
nents across rate and stress transformations. Since the slope of the Vp-d re- 
lationship for a given gesture type can be expressed in units of frequency, a 
perflfct correlation between the two variables would Indicate that the opening 
or closing gestures were of the same frequency, i.e., were perfectly isochro- 
nous. There are, however, local effects of stress and rate, when the data are 
partitioned into subcategories, as can be seen from the absolute values of 
displacement, peak velocity, and duration given in Table 2 for opening ges- 
tures and Table 3 for closing gestures, in Table b we present the linear 
regression slopes and correlations of the peak velocity-displacement relation- 
ship f^r opening and closing gestures as a function of stress and rate. Over- 
all, although the correlations are generally high and significantly different 
from zero, the slopes of the relationship between peak velocity and displace- 
ment are quite variable across subcategories. How might the slopes of the 
kinematic relation betw4rt?n Vp and d be interpreted with respect to the control 
processes underlying the reiterant speech task? First we addresa the signifi- 
cance of the overall Vp-d relation, then we consider the specific effects of 
rate and stress. 

Recent theoretical considerat i ons and empiricgl finding;-, m u.v motor 
control fitid support an account of tne Vp-d' r'>ration that is based on .i move- 
ment's dynamics, not its kinematics. Relationo '.imong kinematic variables arc 
useful to describe the space-time behavior of articulators, but it is dynamics 
that ^auw such motions. That is, it is important to realize thuf, chur.gt-y in 
displacement and its time derivative'.; (velocity jn-:! iccelerjtion) art; conse- 
quences of dynamical systems with parameters such 'uu maiiij, litiffness, and 
damping. It is possible, however, to infer tht; structure of tn*' underlying 
dynamics from the Kinematics of articulator motiony during either discrete or 
rhythmic tasks. 
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Tjible 5 

linear- correlations (r) and regression slopes (a) of peak velocity-displace- 
ment relationship across rate and stress transformations (/ba/) 

A. Opening Gestures 
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B. Closing Gestures 
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All r's except those marked by an asterisk are significant at p < .01 or 
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It is now generally recognized that niany features of single dimensional 
movements in discrete targeting tasks can be generated by second-order, linear 
models whose parameters include damping, stiffness, and rest angle (cf. Bl'zzl 
1980; Cooke, 1980; Fel'dman & Latash, 1982; Kelso & Holt. 1980 for re- 
views). In short, the limb exhibits behavior qualitatively similar to a 
damped mass-spring system for these tasks (Fel«dman, 1966). Such systems are 
intrinsically self-equilibrating in the sense that the "endpoints" or "move- 
ment targets" are achieved regardless of Initial conditions. In normal and 
deafferented animals, for example, it has been shown that desired head (Bizzi. 
Pollt, & Morasso, 1976) and limb positions (Pol it & Bizzi, 1978) are attain- 
able without starting position information even when the limb Is perturbed on 
Its path to the goal. Similarly, Kelso (1977) demonstrated that finger local- 
ization ability is not seriously Impaired in functionally deafferented humans, 
or individuals with the metacarpophalangeal Joint capsule surgically removed, 
in sp'ite of changes in initial conditions or unexpected perturbations (Kelso & 
Holt, 1980; see also Kelso & Tuller, 1983, and Tye, Zimmermann, & Kelso, 
19o3# for evideno6, in speech), Cloaed-loop notions that rely on peripheral 
feedback break down in the face of such evidence. Further, kiricmatic vari- 
ables need not be controlled explicitly. In a dynamic system like a damped 
mass-spring (or point attraotor . Abraham & Shaw, 1982), kinematic varlationa 
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in 4i^laoeMntff velocity, aooaleration ooom* as a result of the specified 
paraaeter values, and sensory **fe«dback" in its oonventional fons is not re** 
tiilr«tf« ipp| iiiKrtantly. is duratipii a oontrolled mrlable (see Seetion 



\ For siistained, stable cyclic aovements •f dissipative systens the ai>prp- 
pi^ate dynaaic regime is a limit cycle (or periodic attraotor, Abraham & Shaw, 
198^). In such systems, the same orbit is achieved regardless of Initial 
conditions or temporary perturbations. In the absence of imposed perturba- 
tions^ such systems can display near-sinusoidal steadysute motions that may 



dynamic^. As mentioned earlier, a constant slope in the relationship between 
each gesture's peak velocity and displacement for a given set of gestures in- 
dicates that the gestures are perfectly isochronous. With regard to an hy- 
pothesized underlying linear (harmonic) or nonlinear (anharmonic) mass-spring 
model, the Vp-d slope is indicative of the stiffness over the range of f^stur- 
al displacements examined* Roughly speaking, a constant Vp-d slope for a giv- 
en gestural subset implies that the average mass-normalized stiffness (K« ) 
of the spring functions underlying the gestures is the sape across the ^ob- 
served range.* Recently, Ostry, Keller, and Parush (1983) have shown in a 
study of tongue dorsum movement that the slope of the Vp-d relation varies 
systematically with stress, but less so as a function of rate. In their data, 
particularly for opening gestures, the slope of the relationship was greater 
for unstressed than stressed geftures, suggesting to their, that the tongue mus- 
cle system was actually, stiffer in the unstressed environment (sec also 
Laferriere, 1982, for evidence leading to the same conclusion). More recent- 
ly, observations of tongue dorsum kinematics as a function of rate, vowel 
(/u/, /o/, and /a/), and consonant (/k/ and /g/) have been interpreted as in- 
dicative of an underlying mass-spring control regime with constant linear 
stiffness for a given gesture (Ostry & Munhall, see also Munhall & 

Ostry, 198J!i). ' ^ 

Our data also suggest that unstressed gestures are characterized by 

?reater stiffness (K» ) values (as revealed in Table 5 by the slopes of the 
p-d relations and the phase portraits) than stressed cnes. This Is apparent 
in three out of four cases for both opening and closing gestures (Table 5). 
Interestingly, we show also that the Vp-d slopes fcM* closing gestures (again 
with a single exception) are greater than those for opening gestures, particu- 
larly for unstressed syllables. Like the Ostry «t al. (1983) tongue data, the 
rate effects on the slope of the Vp-d relationship are less clear cut. In on- 
ly five of eight possible cases, slope Increases as a function of rate. With 

^ ablg exc eptionf howe ver, in whif!h a fnur fnl r1 i nfinn . nnn in m]f^f ^0, ^ f^^ n ^ m ^ 

e changes between fast and normally produced gestures are fairly small. 

Although the data in general suggest that Stiffness (K* ) is a key sys- 
tem parameter, a comparison of the Vp-d slope data (which indexes K» ) and 
the displacement data shown In Tables 2 and 3 reveals that stlffness^is not 
constant for movements of different displacements within a given stress condi- 
tion (see also Ostry f Munhall, 198il), In fact, stiffness changes invariably 
with dlsplacenwnt botr within and across stress categories. 
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3# Tf» >ccelfation-^dl5pljic««efit fupctlofi ^ J 

^'^^^^H*^^ can accoKiit for the amtv^ 
WJ^ («Jy) wVnmctm dlilt)lac«»ftt; ftie is that a line- 

«• sppiQf fuaetitxi hol3&, for itiiel^ring force equals -kx and for which, dif- f 
ferent values of linear stiffhfeM/k, are eleoted for, say, shorter M^litude* 
wwtrwsed jesmres than larg^/^wplitude, stressed gestures. An alternative 
. notion is that during reiterai>t>8ge*ch the Jai^llp systea behaves like a soft, 

nonlinear ■ass-spring syste* wherH. for exaople, spring force equals -kx ♦ 
, Wth k and e dencTting linear and cubic stiffnesses, respectively 
(cf. Jordan & Saith, 1977; Kelso, Pii^naa, & Goodaan, 1983). For such 1 
springs, the net . atli^fness decreases nonlinearly with deviation from the 
•qullibriiffl position. Henoe, shorter ai^>litude gestures, involving relatively i 
aaall deviations froa equilibriua, are characterized by higher average I 
stiffnesses over the course of the movmnt. than larger aoplitude gestures i 
if*? *Jf* Footnote ^l). This second hypothesis is presaged on the assunption 
that all the abtions we have observed arise froa a single underlying nonlinear \ 1 
spring function with constant linear and cubic stiffness coeff ifeienta. Since 1 
a gesture^s linear stiffness coefficient is indexed by the slope of the 
acceleration-displacement function near the gesture's midpoint (corresponding = 
roughly to. its equllibriun position), we can distinguish between these-f^rego- 
Ing alternatives* i 

— The acceleration data of the lower lip-Jaw combination were obtained by v ^ 

velocity differentiation and saoothed over a 25-m3 interval (see Section I). *^ 
/Linear insUntaneous, mass-normalized stiffness, K», was estimated using a ' 
computer routine that first found the midpoint of a given opening or closing 
/ gesture and then obtained the position (x) and acceleration (X) coordinates of M 
the jdata sample to each side of the midpoint. This procedure allowed us to ^ 
, compute the slope around the hypothesized equilibrium position. If K« is un- * 
/ Changed across conditions the slopes should be statistically equal. Thus If * 

the data lie on a single spring; -function (linear or nonlinear) K» ahould be 
identical close to the movement's midpoint. Different slopes of the x, X 
function, however, would suggest' separate spring functions with distinct line- < 
ar stiffness components.' N 

Figure TO (inset) shows how K« was estimated and also an example of the : 
acceleration-displacement differences between the opening and closing gestures > 
of a stressed versus an unstressed syllable, the fourth and fifth syllables 
(underlined) of the relterant versions of "There is ac- cor-ding to legend" 
if^t»Vr»te, SK). Differences in slope are apparent, with the shorter ampli- ^ 
tude^ unstressed gestures displaying greater K» values than the longer ampli- ''_ 
tude, stressed gestures. 

J- Statistical analysis bears this picture out. The mean estimated K» and "i 

its standard deviation are provided for . each subject and each gesture type as 4 
L ^ function of conditions in Table 6. Stressed gestures as a class have lower 

M ' * ^* values than unstressed gestures, Fs - 9.38 and 192.13, £S < o.Ol for DM and 1 
ml, ... ^* respectively. Subject DW displays a gesture type main effect, F - 19.16=; '4 

mrr - ■ ^ £ ^ 0.0001 with K» significantly greater for closing than opening gestures! 4 
Additionally, , for SK there is a gesture type X stress interaction, F - 20.39, ¥ 
2 < 0.0001, and also a gesture X stress X rate interaction, t -I1.70, p < 
p.O^, A slmpU main effecU aoalyais of these interactions revealed that for 
SKi a) K» is greater for unstressed gestures than stressed gestures for both 4 
opening, F - 168.85, £ < 0.0001, and closin>^ gestures, F - ^43. 67, £ < O.OOOt; ^ 

■ . ' ' • U9 . ' ' ' ' ' - M 




OPEN 




11.8 



DISPLACEMENT (mm) 

Figure 10, Acceleration versus dlsplaceoent frcw rest position for the open- 
ing and closing gestures associated with a stressed and unstressed 
syllable (see text). The smaller displacements and steeper slopes 
corresDond to the unstressed gestures. The opening ge^itures start 
at the bottom right; closing gestures start at top left. 



Table 6 

♦ 

Estimated stiffness (K*) in units of acceleration per unit displacement across 
rate and stress transforroations. Standard deviation is in parentheses* 

■m 

A. Opening Gestures 
Stressed Unstressed 



Normal 



Fast 



DW 
SK 

DVf 
SK 



1^**15 {^05.)- 
1781 (3^2) 

1 931 (836) 



1703 (561) 
2413 (703) 

2336 (787) 
3555 (1211) 



Normal 



Fast 



DW 

SK 

DW 
SK 



Stressed 

135^^ (3611) 
1981 (338) 

2U09 (378) 
2193 (337) 



B. Closing Gestures 

Unstressed 

1889 (1196) 
2306 (311) 

2633 i^52) 
3078 (880) 



1 • 
i « 
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b) K« i^^treAUr for stressed gestures In the closing phase than the opening 

JE < 0.0<Mij and c) K» is greater for unstressed gestures in 
the <Miin#^ phase^ F - ^^.n, g, < particuiariy at the fast speaking 

/ The eff«^ oA>#te is highly significant for both subjects. In all the 
eel,! s of si«il«!^ ooMitions, K« is greater at the faster sp4MJ(ing rate than it 
is/ in syllables Xprodu^d at a conversational pace, F » 69.^13, p < 0.0001 (W) 
«iW £ - 90.80, \< 0*0001 (SK). Subject SK also""reveals a strMs X rate 
ijteraotion, F -I1.78, £ < 0.0001, although OW does not, F - 1.22, p > 0.05. 
For subject SK: a) at both rates, K» is greater in unstressed than stressed 
i^estures, F - 27.39, £ < 0.0001 (nor«al) and F - 206.55, £ < 0.0O01 (fast); 
fnd b) only in unstressed gestures and in str«5sed closirii gestur^, however, 
IS greater for £ast than for noraal speaking rates (see Table 6). 

/ These data correspond rather well to the peak velocity-displaoettent find- 
/ In/js discussed in the Section IIC3. The present acceleration-displacement re- 
/ suits, however, afford an additional conclusion, naaeljii that linear mass-nor- 
/ iMlized stiffness (K», estiaated around the eqtiilibrium point of the sotion) 
2Si ^^■^ short aaplltude, unstressed gestures as it is for large 
aaplitude, stressed gestures. In short, different stress categories are char- 
acterized by different K» values. A siailar .conclusion applies to rate 
changes. In all, the cells froai coaparable conditions shown in Table 6, faster 
speaking rates are accoopanied by higher estimated K» values, and, as we 
reported In Section IICI, smaller displacements. Thus although a constant 
linear stiffness model is a reasonable first approximation, it' does not handle 
all of the kinematic variations in our data that are induced by stress and 
rate. For whatever reasons, no doubt in part linguistic, linear stiffness is 
modulated according to the stress (or amplitude) of the gesture. Increasing 
stiffness for upiitressed (shorter amplitude) gestures may be a way for the En- 
glish language, conventionally classified as stress timed, to differentiate 
its stress categories. Interestingly, recent theorizing in speech perception 
argues for a perceptual metric that is closely tied to artlculatory dynamics 
(e.g., Summerfleld, 1979; Studdert-Kennedy , 1983). The notion, based in part 
on studies of visual perception (e.g., Runeson & Frykholm, 1981) is that 
perception of events is not simply based on surface kinematics, but on the 
underlying relations among dynamic parameters that govern suol) events. The 
present findings, in showing a clear relation between stress and linear stiff- 
ness values, provides an initial grounding for these speculations. The data 
also Show that faster speaking rates are associated with higher estimated lin- 
ear stiffnesses, though, like the Os try et al. (1983) tongue data, the rate 
effects are not quite as consistent. 

Summary and preliminary dynamic mod/llng 

To summarize, the present data offer insights into both the similarities 
and differences in our subjects' artlculatory behavior. The movements of both 
subjects can be assumed to emerge from the same underlying dynamic organljsa- 
tion. That is, a' periodic attractor (limit cycle) control regime can capture 
the forms of motion produced by both subjects. The slopes of the*'peak veloci- 
ty-displacement anf the acceleration-displacement functions point to linear 
mass-normal ixod stiffness, K*, as a key dynamic parameter. The subjects dif- 
fer, hpyQYer, m the degree to which estimated K» and ovet*all gestural dia- 
-placement are coupled ac ross movement conditions. Subject SK shows an Inverse 
relation between stiffness and displacement for opening (p - -0,77) and clos- 
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Ing (r - -0.73) g«atur«a, Thua largar (aaaller) anplltude rsotioria that aoooa- 
pany atraaaad (wiatraaaed) gaaturea and ^nomal (faat) rates are aaaociated 
i^tA lowar <hi^wr) atlffneaa. For 01^ homtmr. Urn oorreUtion between 
wttHfc^ Mt t f^tffti ^^iplmeMiit (iitte hep itti^p'lxjawaiit buiukiiI ttw re- 
lation) la low (-0.18 for opinins and "0.25 for closing gestures), perhaps be- 
causeof the reasons discussed in Section IICI . In short, the "strength" of 
the constraint between K* and displace.ient «ay be a usefMl way to conceptual- 
ize between-subject differences. 

The present findings can be couched conveniently within a recent dynanic 
■odelif^ and co^uter simulation framework developed for aultlarticulator sys- 
U«s by SaltzMan and Kelso (1983). Briefly, the unique feature of this ap- 
proach is that invariant dynairfbal equations of notion are established func- 
tionally (I.e., at an abstract task level of novesent description) for ''the 
particular end effectors directly involved in the task's accowpHahment. For 
exaople, Saltzoan and Keleo (1983) denonstrate that a constant set of dynamic 
parameters defined for a given task, e.g., a hand reaching for a target, can 
be used to specify context- (task and posture) dependent patterns of change in 
the art Icula tor-level dynamic parameters (e.g.. Joint stiffness, damping, and 
equilibrium points of shoulder, elbow, and wrist). Among other advantages the 
approach allows for task-specific trajectory shaping (e.g., Blzzl, Acornero, 
Chappie, & Hogan, 1982) and the compensatory behaviors typically involved In 
speech (e.g., Folklns & Abbs, 1975; Kelso, Tuller. V.-Bateson, & Fowler, 
1981). 



At a recent conference, Browman, Goldstein, Kelso, Rubin, and Saltzman 
(1984) reported that the task-dynamic approach can be fruitfully applied to 
understanding speech (»*ganlzatlon. For exas^e, we have used the average val- 
ues of amplitude and duration from the present data (for stressed and un- 
stressed gestures at a particular rate) to estimate the dynamic parameters 
(equilibrium positions and stiffnesses) In a functional mass-spring model for 
the control of lip aperture defined by the vertical dlsUnce between the upper 
and lower lip. These lip aperture parameters remain Invariant throughout a 
given lip opening or closing gesture, and during each gesture are transformed 
into contextually varying patterns of dynamic parameters at the artlculatory 
level (upper lip, lower lip, and Jaw degrees of freedom as defined In the Has- 
kins Laboratories* software artlculatory synthesizer; Rubin, Baer, i Merael- 
stein, 1981), Thus Inserting our empirically estimated dynamic parameters for 
lip aperture Into the task-dynamic model, we can generate sets of simulated 
articulator trajectories associated ' with lip opening and closing. Figure 11 
Illustrates slmilated time series and phase plane trajectories for the resul- 
tant vertical motion of the lower lip during a reiterant bilabial task with 
simple alternating stress. 

In these simulations, the equilibrium position for a given cycle (clo- 
sure-to-closure) is specified at the onset of the opening gesture and is 
located halfway between the maximum opening position and the (relatively 
fixed) closure position. However, because closing gestures are faster than 
opening gestures (compare Tables 2 and 3) stiffness is specified twice during 
the cycle: once at the start of the opening gesture and once at the start of 
the closing gesture.* Although the present example simply shows an alternating 
stress pattern, clearly this procedure can be executed on a syllable-by-syll- 
able JasJ^.__..Allhough_the_j5ode^^ Is aresently undergoing refineatent (e.g., to 
incorporate fully limit cycle dynamics), Browman et al. (198^) have used the 
dlsplacen»nt-tlrae data shown in Figure 11 as input parameters to an articula- 
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Figure 11. Conputer simulation of resultant lower lip position and velocity 
. time series (left) and corresponding phase portraits for a pattern 
of relterant, alternating stress syllables. 



tory synthesizer with promising acoustic and perceptual results. The point 
iierc, however, is that the simulation illustrates how artioulatory trajecto- 
ries can be generated frore a simple specification of dynamic parameters with- 
out explicit or detailed trajectory planning. 

III. Conclusions 

The phase portrait methodology introduced in Section IIB, along with a 
afetailed -analysis of articulatory kinematics, allow us a window into the hy- 
pothesized dynamic structure underlying the production of simple, relterant 
syllables. It is popular to propose "time control" as the basis of temporal 
organization in speech, as if the system somehow had to program and/or keep 
continuous track of time (e.g., Llndblora, 1963; bindblom et al., ^9S^). Dif- 
ferent time control schemes, according to this notion, correspond to stress 
and rate, while other kinematic variables, such as velocity, are computation- 
ally derived (cf. Kuehn & Moll, 1976? taferriere. 1982). In an alternative 
view, which we have applied here, spatiotemporal pattern arises as a conse- 
quence of a dyn^wlc regime In' which— at worst— only two a^-ticulatory parame- 
ters, stiffness and rest position, are specified according to stress ^nd rate 
requlpeaenta. Similar arguments have been proposed for the space-time struc- 
ture of muitidegree of freeOoM ii«b Aovewmta (Kelso, Putnam, & Goodwart, 1903; 
Kelso, Southard, & Goodman, 1979). The dynamic description captures the forms 
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of •rtioulatory sot ion ob»6rve<l In our phas« portraits across rate and stress 
oonditiona*. It reco^iizas in full that articulatory motions »yolv^ .in tia« 
^t it un<Jercut9 the n«o«ssity to ragulatt tim« as a oontrolled variable 
«ISPl4QitJbr. OynSBioa oan proificte a fP«tf»lJUig for» and a principled analy»ia 
oT 50-Cillad lntrinsic\tiiBing theories of speech production (Fowler et al., 
1980)« According • to the present findings and supplemented by preliminary 
•odeXing, movement time results from 'an underlying d namic organization that 
is specrified according^o linguistic requirements and that remains invariant 
throughout the production of a given speech gesture. 
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Footnotes 

'For many examples of complex, multicomponent systems in physios, chemis- 
try, and biology, whose cooperative behavior can be described by a snfiall set 
of dynamic or "order" parameters, see Haken (197'j, 1977). For examples in 
speech and other biological motions, see Kelso and Tuller (198i^bJ. 

*The field of qualitative dynamics has a rich- history dating back to 
Poincar^ (1899) (see Abraham 4 Shaw, 1982; Garf inkel , 1983). In this vein, 
we combine geometry and dynamics to reflect our concern with the f orras . of 
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articulator motion (indicated by patterns of displacement, velocity, and 
acceleration over time) that are created by a functionally defined dynamical 
organisation (e.g., point attractor or periodic attractor dynamicfs). 

'Note that the plotting convention here is not the one typically used in 
dynamics, which plots position on the horl2or\tal axis and velocity on the 
vertical axis. Since the displacements measured here are vertical, not 
horizontal, we have simply switched the axes to conform to the behavior of the 
lip-Jaw system and to facilitate visualization of the process. 

''Both linear and nonlinear mass^-spring systems can display Tiear sinusoid-- 
al cyclic motions whose Observed peak-to-peak period T - 2r/Q^^ where a 
denotes the observed angul^ frequency for the cycle. For systems with con-- 
stant parameters, the peak-to-valley duration (D m n/fi ) and the val- 
ley-to-peak duration (D^ m Tr/Oy) are equal and, consequently, T « Op ♦ Dy 
2D m 2Dy, and fl^ « fl^ » Hp. More generally, in cases where motion during 
each half**cycle is near-sinusoidal but of different duration we have fl^ - 
2(fl^Q /[n^-^fl 1), A 1 1 near undamped mass-spring system (harmonic oscillator) 
may be characterized by the following equation of motion with constant parame-- 
ters: 

mX ♦ kAk * 0, 

where m-^mass, k-linekr stiffness. Ax » (x - with » rest position; and 
x and X represent position and acceleration, respectively. Such sySjJ^ems dis- 
play cyclic motions with period T » 2tr/fi^ where ^c> ' * (k/ro)^, and a>^ 
(denoted K* in Section IIC3) defines the mass-normafized linear stiffness of 
the system. Due to system linearity, the instantaneous system stiffness is 
independent of displacement and, hence, both the Instantaneous and the "aver- 
age" stiffness of the system for motion cycles of different amplitudes are 
simply equal to k. Normalizing with respect to mass, we see that the average 
mass--nonnalized stiffness descfibed in the text, K*^, is simply k/ra u)l^ 
fl^) for linear mass-spring systems. Additionally, the peak velocity (Vp) for 
harmonic oscillators' is (j^A, where A denotes the maximum displacement from the 
rest position during cyclic inotion* ^Consequently a plot of Vp versus A for 
different amplitude cycles of a given linear oscillator shows a straight line 
whose slope equals tu^. Thus, for given linear mass-Spring system the Vp-A 
slope is equal to a)^ * » ^'^av^^ constant across the entire dis- 

placement range. For undampea mass-spring systems with nonlinear stiffness 
functions (anharmonic oscillators), however, the average mass-normalized 
stiffness, '^av'^^ motion cycles of different amp-litude is not so simply 
related to the system^s instantaneous stiffness. For example, for a soft non-- 
linear spring (cf. Jordan & Smith, 1977) the equation of motion is: 

mX ^ kAx - eAx' « 0, 
Where e cubic stiffness, and all other terms are as in the linear case. 
Here, the system's instantaneous stiffness does not equal k but is a nonline-- 
arly decreasing function of the amplitude of motion. Thus, the system's K*^ 
will vary for cycles of different amplitude with K*^ decreasing for increas- 
ing amplitudes. Additionally, the plot of Vp versus A for different cycles 
will have a slope that is a decreasing function of amplitude, unlike the line- 
ar systems described above. ^ Yet, like these linear systems, the Vp-A slope is 

stilJ proportional to (K* )2' 

av 

•We had two concerns about the derivation of acceleration- First, we 
want^ 1 to ascertain how the elected smoothing window changed the values of the 
central portion of the trajectories where the slope of the acceleration-dis- 
placement function was calculated in this ^tudy. Second and relatedly, we 
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wished to ascertain how our derived (araoothed and differentiated) acceleration 
compared with actual acceleroraetric data. Two independent methods were used 
to exaaine the effects of numerical smoothing and differentiation on the 
acceleration data; in both cases, the effects were small. (1) To assess the 
effects of smoothing on the reported data itself, the x,X slope was calculated 
for a subset of subject SK's reiterant productions (16 opening gestures 
representative of the overall stress/rate distribution) at four degrees of 
smoothing: 15 ms, 25 ms, twice at 25 ms (the condition used in the text), and 
once at 25 ms and again at ^5 ms. Increased smoothing reduced mean slope val- 
ues, F(3,60) - 3.^&f £ <.05, but did not change the pattern relative to over- 
all gesture displacement, F(6,56) = .37, p >.1. (2) The influence of the dou- 
ble differentiation (i.e., acceleration derived from position) and concomitant 
smoothing procedures was tested at a movement frequency (5 Hz) comparable to 
that used by speakers in the present study (see Table I), by comparing the 
output of an Entran accelerometer '(model EGC-2^0-10D/ to the second derivative 
of position output smoothed twice at 25 ms. Taking into account the gain re- 
duction induced by smoothing (see above), we found the average, absolute 
difference b'etween transducer' (unsmoothed) and numerical (anoothed twice at 25 
ms) acceleration to be less than 5 percent of the range of measured movlratents. 
The cross-correlation between the raw, unsmouthed and the smoothed, Je\yed 
signal was r - .98. Note ; Not all the x,X functions approximated straight 
lines as closely as those illustrated in Figure 10. Some were S-shaped 
("hooked" at displacement extrema). However, our smoothing procedures did^^iOt 
remove the "hooks." More important, our estimat^js of K* were not affeci^ by 
the presence of such "hooks." 




A THEORETICAL NOTE ON SPEECH TIMING* 

J, A. S. Kelso, t Betty Tuller.tt and Katherlne S. Harrlsttt 



We wish to make a few brief comments on the comae nt at ors' remarks and 
then introduce a representation of interarticulator timing in which time it- 
self is not explicitly involved. To show that such a representation is valid 
will require a recasting of what data there are en relative timing (e.g., 
those discussed by LOfqvist) into a geometrical, phase portrit description of 
articulator trajectories. We have begun to do this. The pakse oortrait cap- 
tures the forms of motion caused by an underlying dynamicyrganimion (Abra- 
ham & Shaw, 1982; Kelso & Bateson, 1983; Kelso, Holt, Rywin, & Kugler, 1981; 
Saltzman & Kelso, 198<4), in which time as we traditionajfrly measure it (e.g., 
as duration, latency) is nowhere to be seen. We believe that certain advan- 
tages for understanding speech motor control and developing articulatory 
models accrue immediately from this perspective. But first some comments. 

1. Our paper presents a systematic set of data in favor of relative tim- 
ing among pertinent articulatory gestures. It is an effort to understand the 
benavlor of ah articulatory system that is stable across linguis'ticaUy mean- 
ln.:rul transformations. Relative timing, as we propose it, is simply an index 
of a temporally stable state. It should not be considered as mandatory (Perk- 
ell) or necessar-ily inherent (Clark & Palethorpe) . Clark and Palethorpe (this 
volume) set up a binary distinction (acoustic versus articulatory) that is not 
one we have ever subscribed to. 

?.. Our ^aper Identifies the timing of articulatory gestures associated 
with consonan>ts relative to those associated with flanking vowels. Because 
strictly speaking, the data presented by Perkell (though interesting) do not 
pertain to this issue, brevity precludes any extended commentary. The reader 
should be aware, however, (a) that other, very different accounts exist of 
coarticulatory effects of the kind discussed by Perkell (e.g., Bell-Berti & 
Harris, 1979); (b) Clearly many variables are involved in , any account of 
speech production as Perkeii notes. To say, however, that the variability in 
pbservable output (e.g., trajectories) is accounted -for by the variability In 



•Authors' response to comments by P. F. MacNeilage, A. LOfqvist, 
J, E. Clark, S, Palethorpe, and J. S. Perkell, on a paper by K. S. Harris, 
B. Tuller, and J, A. S.Otelso entitled "Temporal invariance in the produc- 
tion of speech." In J. S. Perkell & L). Klatt (Eds.), Invariance and varia- 
bility iji speech processes . Hillsdale, NJ: Erlbaum, in press. 
tAlso Departments of Psychology and Biobehavioral Sciences, University of 
Connecticut. , . ' 

ttAlso Cornell University Medical College. 
tttAlso Graduate Center, City University of NeWi York. ; 

Acknowledgment . This paper and the researchVliscussed herein were support- 
ed by NINCDS (}rants NS-1 361 7, NS- 17778, BRS Grant HH-X)bb% , and ONR Con- 
tract No. N0001 4-83-C-0083. Thanks to Elliot Saltzman for very helpful 
suggestions on the manuscript. 

HHASKINS LABORATORIES; Status Report on Speech Research SH779/30 (198*»)} 

161 

168 ♦ 



Kelso et al.; A Theoretical Note on Speech Timing 



programing la circular at best (cf. Kelso, 1981); (c) Abbs' paper (this vol- 

aiuS eXsewDere), MaoNaiaage^s writings U-S* » MacKeilage, 1960) and our j 
ejtj^iaent^ (Keiae, TtiUer, Bateson, & rowler, tWi Kelso, Tuller, & Powler, i 
1982) all converge in showing that the speech motor control systen does not \ 
progran Urgets in .articulator space (cf, Perkell); and (d) we certainly : 
agree with Perkell that' sore data are needed, but so are new concepts. 

3. In our final remarks we want to return to the^heroe of oitf* paper, ' 
namely, the relative timing among artlculatory gestures. We wish to show how, 
by examining the data using phase plane techniques an entirely different 
conceptualization for the relative timing finding emerges. We are presently 
analyzing existing data and conducting new experiments to examine this 
conceptualization further. Consider the simple case In which the latency (in 
ms) of onset of upper lip motion for a medial consonant Is measured relative 
to the Interval (In ms) between onsets of jaw lowering motions for flanking 
vowels. These events are displayed in the idealization of Figure 1A, in which' 
the duration of the VI to V2 cycle Is Jd (in ms) and the latency of upper lip ^ 
motion Is LI (In ms). As we have shown, the two event"sr are highly correlated 
across rate and stress changes. That Is, the lip latency varies systematical- 
ly with Jaw cycle duration plus an Intercept value that seems to change across 
phonetic context and speaker (see Figure 1B). Note that this Is a strictly 
temporal description. One could posit. In this example, that somehow the sys- 
tem Is keeping track of the duration of Jaw ojotlon such that when a glveh 
amount of time has passed, another articulator, say the upper lip. Is activat- 
ed. Such an account of speech or limb movement control Is not unusual. 




Figure 1. A. An idealized time series description of Jaw and upper Up mo- 
tion. B, The empirical relation between Jaw cycle duration, Jd, 
and upper lip latency, II (see text for details). 
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A very different view of these events eoerges i^en the articulatory data 
ere wtpreeeed as notion trajectories on the phase plane. Two quantities are 
fl^^^ to do this, the articulator's displacMnt (%) and its velocity Cx), 
™»« Qusntitids Bay be considered to he the coordinates of a point on the 
articulator in two dloensional space, the phase plane. As time varies, the 
point P(x,x) descrioing the aotion of the articulator noves along a certain 
path on the phase space. Mote that tine, although implicit and recoverable 
froB this representation, does not appear explicitly in the ^se plane 
description. For different initial conditions, the corresponding^ paths will 
be different, and the set of all possible trajectories constitutes the phase 
portrait of the system's dynamic behavior. Finally, one can transform the 
Cartesian x,x coordinates into an equivalent polar forn ^^ribed by a phase 
angle (f-tan *Cx/x]), and a radial amplitude (A-Lx* ♦ x*J«. • In discussions 
below, the phase angle is a key concept in our interpretation of interarticu- 
lator timing phenomena. 

Figure 24 Illustrates tne mapping from time domain to phase plane for a 
motion trajectory generated by a simple, und&aped mass-spring system.* In a 
similar fashion. Figure 2B shows the phase plane trajectory for the idealized 
Jaw motion described as a time series event in Figure 1A. In the phase plane, 
this Jaw motion describes a closed orbit, since the 'jaw goes from closed to 
open and back to closed in one cycle. Note that, in comparison to Figure 2A, 
the axes in Figure 2B have been interchanged in order to express pictorlally 
that the Jaw moves vertically in space. In the phase plane, one can plot Jaw 
motions during V1V2 intervals of different duration, and can identify the on- 
set of upper lip motion during each cycle with an onset phase angle for that 
cycle. Our hypothesis Is that the phase single for upper lip onset should be 
the s»ne across Jaw cycles of different shape. I.e., across different rate and 
stress conditions. Two Idealized examples are illustrated in Figure 3. in 
one, a small orbit is shown, corresponding to a small displacement of the jaw 
over time, in the other, a larger orbit is shown. Thejphase angle of upper 
lip onset, 0, ^ predicted to be Invariant as shewn irt Jpte right hand side of 
Figure 3» though we do not claim it to be the one sholff here.' Note that the 
onset of a remote articulator (e.g., the upper lip) ,1s now with reference to 
the phase angle of another articulator (e.g., the Jaw). This angle is there- 
fore a function of the latter articulator's position and velocity, not merely 
its absolute position or velocity. Horeover , there is no need to posit any 

S£ time-keeping mechanism or time controller . In this "view ,"1 nd i v i duala 
can produce articulatory motions of different durations or magnitudes without 
affecting the hypothesized regularity in onset phase angle. 

tcf summarize our theoreticai points: When representing articulatory mb- 
tlona geometrically on the phase plane, neither absolute nor relative time 
need be extrinsically monitored or controlled. This fact potentially provides 
a grounding for, and a principled analysis of, so-called intrinsic , timing the- 
ories of speech -production (e.g., Fowler, Rubin, Reraez, & Turvey, 1980} see 
also Kelso & Tu^ler, in .press). Our view is supported Indirectly by demons- 
trations in the articulatory structures themselves of affer«it bases for phase 
angle InformationMe.g. , position and velocity sensitivities of muscle spindle 
and Joint structures), but not for time-keeping information (e.g., time recep- 
tors; cf. Kelso, 1978). It might well be the case that certain critical 
phase angles provide information for orchestrating the temporal flow of 
activity, among articulators (beyond those considered here) and/or vocal tract 
configurations. Such phase angles would serve as natural, i.e., dynamically 
specified, information sources for coordinating .speech. Interartlculator 
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Figure 2. A. The mapping of a simple undamped mass spring motion on to the 
phase plane* B* fhe jaw cycle of Figure 1A characterized on a 
Afunctional' phase portrait, i.e., displacement is on the vertical 
axis and velocity on the horizontal axis. The polar coordinates, 
the phase angle, ^, and the radial amplitude, A, are alsio ahown in 
J the diagrams (see T^xt for details). 
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Figure 3. The phase position of the upper lip relative to the Jaw cycle for 
different jaw orbits (see text) and the consequent hypothesized re- 
lation (see Footnote 1). 



speech coordination thus may be captured better with reference to events that 
are specified by the system's dynamics than with reference to sets of dura- 
tional rules. These ideas and others are also currently being explored by dy- 
namic modeling (Saltzraan & Kelso, 196'0: 
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Footnotes 



'Note that the Jaw motion, though idealized here, does not have to be 
(and is not usually) sinusoidal. Thus, different relative timing relations 
among articulators can give rise to the same phase position between articula- 
tors and vice versa . The determining feature is the shape of the trajectories 
(for many more details, see Kelso & Tuller, in press). 

*To date, we have examined this relationship for two speakers and two 
phonetic contexts, /babab/ and /bawab/. In each case, the phase angle of the 
upper lip for the medial consonant relative to the jaw trajectory was 
unaffected by changes in stress and speaking rate. 
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F PATTERHS* 
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Abstract s Wien a sequence of pictures is presented in rapid succes- 
sion, tne Illusion of Npontinuous movement can be created^. A 
continuously varying acoustic signal may, contrariwise, be perceived 
as a sequence of ^still** sounds. Not only is speech perceived as 
discrete sounds in sequence, but speakers will oblige, especially in 
the case of stressed vowels, by ^citing" them in the form of steady 
state phonations Judged to match auditorily the vowels in their 
natural contexts. These steady state imitations are adequately 
characterized by Just two numberdy the frequencies of the two lowest 
vocal-tract resonances. Adoustic analyses of a nuntoer of tokens of 
the English nonsense forms or wo>^0 [beb, ded. geg, baeb, da?d, 

gasg] produced in ^the frame Please pronounce once a^ain by rour 

talkers indicate that there is a witrHn- talker pattern of variation 
rather different from the variationsx over speakers reported by 
Peterson and Barney (1952). Moreover, the variation patterns ar>: 
different within syllable types, for the s'^ame vowel across the con- 
texts examined, and for the two formants, iHiere are differences iri 
the way in which Fl and F2 vary with variation in stop place of ar- 
ticulation and in the voicing of the postvocalic stops. These 
variations are in some cases of a kind to pose difficulties for th< 
target-plus-undershoot model as the explanation for the variatioiu: 
observed* They are of a magnitude, moreover, tnat should discourage, 
an attempt to classify vowels automatically on th^ basis of FI-FT 
frequency measurements c>t a single point on their trajoctor jr.- irvi 
without regard to their context. 

The tradition of representing vowel quality acoust ic^il ly by -i .f. 
the plane whose dimensions are the frequencies of the two lowost rer^onanrr^s nj" 
the vocal tract is a long one, with its beginning at a time when the ir.aly.'iir. 
of speech and other 'nonstationary signals was not possit^le, Insi^.i.; r;:.rrn.ii 
speech, the objects of analysis were vocalizations produced with a v k *il tract 
held in fixed position over relatively Idhg intt^rvals. Such vocalizations are 
speech only by courtesy of the fact that they are judged auditorily to matcn 
vowels as components of speech events. Considerable^ attention ir, 3till being' 
given these "nonspeech'* sounds, whether of vov-il tr.irt or rria'^hinf c^rigir;, but 
for a different reason, namely, that the dynamic nature of r,p^H'('h .jotivity 



*This paper was presented orally at the 107th irxM/tlriK -yf tr.* A^;u^,tlr^n ;;o/!i^ 
ety of American, Norfolk, VA, 6-1 ' May, 19^5^^. 

tAlso University of Pennsylvania. 

Ack wwle^^^ . This *<bs supported by NICHf> gr^nt Hb-019M^ to Haskin5 Labo- 
ratories. 
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perturbs what is f^potheslzed to be the underlying striwture of speech events, 
which we want to think of as sequences of discrete elements, each a cooplex of 
featured that Jointly deterwine settberahip in one of a ll«ited<a«t of sound 
categories* For each category a certain articulatory target and associated 
acoustic pattern are posited. This target emerges more or less Clearly in the 
so-called null context and in some few others, such as English /h-d/, where 
disturbing coarticulatory effects are said to be minimal. Contamination by 
context is both hailed as an essential property of speech and condemned as a 
confounding (and confounded) impediment to determining a straightforward rela- 
tion between acoustic signal and linguistic percept. 

Different opinions, undoubtedly based on different equally valid observa- 
tions, have been expressed regarding the scope of contextual perturbation of 
vowels* Thus Schouten and Pols (1979) found that the Dutch vowels they stud- 
ied had steady state intervals whose spectral shapes varied little with con- 
text, but Lindblom and Studdert-Kennedy (1967) reported that iin CVC syllables 
the formants rarely "reach a steady state," and that underf changes in the 
overall duration of synthetic CVC patterns there are shifts irt vowel identifi- 
cation. Presumably these shifts tell us something about variations in natural 
speech, and specifically about variations in what "we <!all the primary 
correlates of vowel quality, the F1 and F2 frequencies. 

J 

If we accept provisionally the idea that the best place to sample the 
moving F-pattern to determine F1 and F2 frequencies that will serve as the op-* 
tiroal index of vowel quality is* at the point of maximum p'l , then it seems to 
me of some interest to learn how much variation this measure will uncover ^ and 
what part of it, if any, may be systematic and attributable to differences in 
context, and also to see how the magnitude of such contpxt-^dependent variation 
compares with F1 and H|2 differences separating distinct v<)wel categories that 
are contiguous on the F1-^F2 plane. In order to get answeps to these questions 
I recorded the speech of three male speakers of American English varieties 
that seem very similar phonetically in their [e] and [ag[] vowels. The three 

speakers produced CV^^yllablcs in the carrier sentence Please pronounce 

once again . Fifteen repetitions of each syllable type were subjected to LPC 
analysis. A fourth speaker carried out the same recordirtg and analysis proce- 
dure as part of his research project for a university se$iinar course. The da- 
ta so far analyzed do not allow point-^by-point comparison across tlie four 
speakers, and a good deal of recorded speech awaits analysis, but already some 
regularities in the relation between stop context and^' F1-F2 variations are 
evident. ' 

In Figure 1 the formant frequencies for 13 tokens of each of the syll- 
ables Lgegr ded, beb, gaeg, daed, baeb] are represented. Mean values of F1 
and F2 are indicated by the location of the intersections of lines whose . 
lengths represent magnitudes of the standard deviations for each formant fre-' 
quency. It is clear that in the productions of speaker L the first formant is 
not subject to any major perturbation by context, but that F2 for both the [e] 
and the [a&] syllables has lower frequencies in labial stop contexts than in 
dorsal , 

Figure 2 shows F1 and P2 plots for the [c] and [ae] syllables in labial 
and (torsal stop contexts as produced by all four speakers* While there are 
slight differences among speakers, all show the syllable [gegl with lower^t F1 
and highest F2, and [baeb] with highest F1 and lowest F2. For each vowel and 
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Figure 1. Means and standard deviations of F1 and F2 frequencies, measured at 
time of maximuro F1 frequency, for fifteen tokens of each of six 
syllable types produced by a single talker L in the frame sentence 

pronounce once again . Lines indicating t one standard 

deviation intersect at point representing mean formant frequencies. 



each speaker, the dorsal step environment, is reflected in a meai. r1 that tends 
to be lower in frequency arid a mean F2 that is clearly higher than in the la- 
bial stop context. We may note that for speaker S the syllable [ocb] is 
closer to [gaegj than it is to Igeg], and that [ga&gj is closer to [bcbj than 
it is to [baebj. The difference i^the apparent effectiveness of FT and F2 as 
indicoi5 of the [fj-LaE] distinction is made clearer in Figure 3, in which the 
dat 1 from speakers L and W, whose patterns are farthest apart, are plotted 
together. The two syllable classes can be separated by a boundary at Fl = 
c.blb Hz. But for F2, while the overall mean value is higher for the [c] 
syllables, combined speaker and context dependent effects yield some Le] syll- 
able types with rather lower mean F2 frequencies than some [is] syllables 
show, I 

The measurement data analyzed for speaker include Fl and values of 
tej and [ae] vowels in syllables terminated by CpJ and [k] as well as [b] and 
tgl (Figure 4). I expected that the shortening associated with final voice- 
less stops in place of voiced stops would result in lower maximum Fl frequen- 
cies. Instead, as we see here, Fl is higher in [gek] than in [gegj, and like- 
wise higher in the first of the pairs Cbepl-lbeb), Cgsek 3-Cg«gl , and 
[basp J-LbjebJ. This effect of final devoicing i& somewhat disconcerting, to 
3ay the least. If we posit a single target value for all syllables sharing a 
particular vowel quality, and we further assume that in the case of Fl any 
faUur© to reach target la a oiatter of undershoot and not overshoot,* then the 
• 'i 
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Figure 2. Means and standard deviations of vowels Lt^ct>f gcg. b^^eb, gaeg] frOD 
four talkers, fifteen tokens of each syllable type per talker. 
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Figure 3. Comparison of two talkers, L and W, snowing greatest differences In 

mean values. 
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Figure Comparison of F1-F2 frequencies for* syllables differing in the 
voicing of their final consonants, as produced by a single talker, 
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fact that all first formant tTOjectoriesvTk. tho syirsibies measured are convex 
''upward,** then a syllable with higher F1 n^nufli ahould be closer to target at 
Ch« point of greatest oral opening. Mop<^over, a shorter syllable should dis- 
play greater undershoot, that is, a /Lower peak F1. Lindblora's studies of 
vowel reduction (Lindblon. 1963) indicate that the shortening ascribable to a 
global speedup of articulation or to destressing has this effect. These data 
fail to conform to the expectation nurtured by the findings of Lindblora (1963) 
and by Lindblora a^d Studdert-Kennedy (1967). Can we suppose that the F1 tar- 
gets for speaker S's [c] and [ae] are more closely approximated in, e.g., 
[gaek] than in Cgaeg], that is, that undershoot is incurred with the voicing 
of the fin^ stop, despite the fact that the duration of the vowel gesture is 
at the same time slowed? Or jan we perhaps entertain the notion that if the 
offset frequency of F1 is higher before final voiceless stops, this results in 
a prior raising of Fl that is detectable as early as the point of roaxirauni F1 
for t^e syllable? Perhaps we might entertain the possibility of overshoot , 
par i^riy if we imagine that [p] and [k] are produced with greater articu- 
lar 'rce than are [b] and [g], unsupported as such an allegation is, and 
that cnsequence the preceding vowels are more energetically articulated, 
with greater departure from the so-calie : "neutral" vocal tract shape. The 
data now on hand need to be augmented before such speculations warrant further 
discussion. 



The data shown In Figure 5 allow us to compare F1-F.? values In symmetri- 
cal stop contexts witfc those found in asymmetrical ones. In the syllables 
IbegJ, [gebj, [Dasg], %nd [gajbj the first formants rise and then fall, but 
the F2 trajectories move in only one direction. We may indeed belter suppose 
that the F2 trajectories traverse r*ather than i-ndershoot any target we might 
reasonably posit. It appears that in these syllables the F1-F2 values at the 
point of maximum Fl are more powerfully affected by the postvocalic than by 
the prevocalic stop. The tendency Is for F1 to rise and F2 to descend in tht- 
ordpr [gt^l~:bEg]-rgeb] and [gaeg j-[ baeg J-[ gaeb ] . But this pror; : r,i tik: 
regularity is marred by the 'lata for [bcb] and [b*b!, whioh hp'- not ,.,it< 
nlc*>ly pl.icod roiative to Lgtb] .-iui [gdDbj. 

Tuo final display (Figure 6) is of data collected to find oat how .;>m'- 
othfr vowfic with qualities close to those of [ f J and [ccj are placed in r. . 
tlon to the latter on the FI-F2 plane. These are the vowels that n-f> 
represontf.-.i ao {e] and [pjj. ju^ fir^t has a quality that is distinct f ; o-: 
LaeJ in thi; area of the cogntry that includes New York City and Phi ladolphi i ; 
it distinguishes tht- word halve from have , for example. The quality of L e j . 
is usually described as diphthongal: the syllable [bejbj is a pronunc iat. ion 
of the word babe. For these two adrlitional syllable nuclei the r.amn cffc(.-t:5 
of lariil versus dorsal etop contoxta are to be observed. Moreover, it ap- 
pears that, evon though [gej^J and [beJbJ are diphthongal as distinct frcn IN.- 
others Cc; is not noticeably diphthongal in L's speech), the ain'^lo me^sur- 
of Fl and yj beinK tPiited ij as effective in separating [ejj from its clo^e^L 
neighbor [e] as in distinguishing [el from [e]. On th'* other hand, placement 
in the F1-F2 plan^j does, not well separate [geg] from [gtg] and [bf-.b]. The ef- 
ferv of substituting [bjs for [g]- ar, the neighbors' of the vowel [f ] ir, gre-it- 
er thfn replacing [ej by [e] with the Lg-g.l context held coastant. This fact, 
ff further analysis corroborates it" as fact, suggests that the context effects 
of stop place and voicing can be of a magnitude to put at some risk any procf- 
dure of automatic vowel classification that depends on FI-F;' Treqijency meai- 
urements m.ade at a single point In time and without regard to context. 
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Figured. Comparison of F1-F? frequencies for syllables synimetrical and 
asymroetrical with respect to their pre- and post-vocalic conso- 
nants, as produced by single speaker S. 
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Figure Comparispri of F1-F? values for !L ,aeJ wiuKthe phonological ly dl3- 
• txnctlye vowels [ejj and [ej in the pattern of speaker 
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To sum up then, F1 and eZ frequencies determined by LPC analysis at the 
points of saximun first fomant frequencies in stop-vowel-atop syllables indi- 
cmt9 that for the two *• adjacent »» vowels Ce3 and [as], t»» maximum F1 frequency 
is more sUble over the get of syllables sharing the same vowel, while F2 fre- 
quency varies more with the place of articulation of the flanking stop conso- 
nants than it does with the vowel. However, the effect of devoicing the 
postvocalic stop is more pronounced on Fl than on F?, its magnitude being in 
fact as great or greater than that interpreted as a shift between [t] and 
[ae]. These differential^ effects appear to be similar for syllabic nuclei 
other than [e] and lae], in particular the vowel [e] and the diphthongal [ej]. 
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SYMERGIES: STABILITIES, INSTABILITIES, AND MODES* 
E. Salti:man and J, A. S. Kelsot 



Nashner and McCoUum have addressed the question of whether muscle syner- 
gies exist for complex skilled activity, and if so, how they are organized 
(see also Kelso & Tuller, 198^1, and Lee, 198^). The authors argue that muscle 
synergies exist for postural stability tasks in theVorm of a small set of 
discretely represented control entities, and that postural corrective move-^ 
ments of the dynamically continuous musculoskeletal" system are ' organized 
through the operation of these discrete synergy elements. In this commentary, 
we make two main points; first, that Nashner and McCollum's arguments are not 
supported sufficiently by their data (i.e., the data do r.ot allow one to 
distinguish between their discrete synergy model and other moael types). We 
will describe the sort of data that would be convincing; and second, because 
Nashner and McCollum stress the "universality and importance of global 
schemer" for sensorimotor coordination and "principles governing the interac- 
tions rinjong elements" that lead to "testable hypotheses" we mention briefly a 
theoretical framework that is attr.ictive to us (e,g., Kelso, 1984; Kelso & 
S^ltzman, 198?; Koloo & Tuller, 1984; Kugler, Kelso, & Turvey, 1980, 1982) 
because it treats cooperative behavior in multi component systems as an emer- 
gent consequence of the systems* underlying dynamics (e,g,, Haken, 197^^). We 
feel that this framework c<m (i) offer a firmer basis for some of Nashner and 
McColiura's existing experimental observations; and (ii) promote an experiment 
tal strategy that would illuminate Nashner and McCollum's hypothesis of re- 
gion-specific discrete synergies , 

Nashner and McCollun: dt.^:;cribe distinct patternii of KMC bursts in response 
to distinct patterns of postural perturbation (e.g., vertical or front-back 
^-platform translation) in the context of givf^n support conditions (e.g., dif- 
ferent platform r,lr,en). Each EMG p-ittern is character i z^d by a tempordlly or- 
dered sequence of bursts within a subset of three agonist-antagonist mut*cle 
pairs (ankle, thigh, and trurjk muscles), Thpy hypothesize that each such pat- 
tern or symTgy operates with reupect to .4 corresponding distinct control 
structure. Kach structure controls corrective postural movements within ;i 
limited subregion of postural configuration spac^' (e.g., inkle angle vs. hip 
af»gle plane), SU':! that wh'^n the body is pt-rturbed the assor^iate^ ( f i ne-- tune^l) 
EMC -burst patter-, will returr. thr bfviy to <i b:^l.K:r:<'^i p03tur^>. In principle-, 



♦Slightly revised version of the authors' commentary on target artiolp by 
Nashner^ L- M., "ind McCollum, 0, The organization of hum:*n postural oiovo- 
raents/: A forma! ba;>i5- aril exper iiri^Mital synthesi:j. Th*^^ Hphavioril and Br^in 
Scienjces , in press. ' " 

tAlso jDepartments of Pny^ihoi '^Ji^jy ar^i iJiob'^^riavi -^Mi cr;.'.*c , Univ^riiily of Con- 
necticut. 
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tioirever, auch synergistic EtKi patterns could ialso be generated by alternative 
«od«Ia (e»g., Utvlnt$ev, 1972; Saitzman & Kel3o» I9d4) In vftiicn control laws 
„4j^Mmtf#fit on task <l,eo Maintain DalaBoe}^ auikpert eoaditioe^ tad postural 
oonfiguration serve to continually specify corrective Joint torque vectors 
that return the body from an unbalanced to a balanced posture. If one defined 
8 fiB'ther mapping from torque vectors to •*«uscle elenent" vectors (e.g., 
Jerard & Jacobsen, 1980; Saltzman, r979) for which muscle elements were 
activated only after inputs exceeded a given threshold, then ongoing 
corrective torques would be mapped into patterns of discrete EMC bursts in 
those muscles appropriate for producing the required tcrquej. This sort of 
control-law model augmented by thresholds for muscle eleaent recruitment 
should generate consistent "synergistic" patterns of po;»tural EMG in response 
to given types of destabilizing Inputs, without reference to discretely organ- 
ized synergy control structures. For stabilizing movements initiated from 
most locations in the postural configuration space, therefore, the above dis- 
crete and control-law hypotheses predict qualitatively similar EMG activity 
patterns. However, the discrete synergy model predicts that there will be 
certain regions of the configuration space for which the EMG predictions will 
be different for discrete and control-law models. 

For the discrete control hypothesis, partitioning the configuration space 
into distinct (possibly overlapping) synergy subregions implies that border- 
lines (or border regions) will be defined between the different control 
domains (see Nashner & McCollum's Figure 3). Nashner and McColluiu's notion 
implies that the system will behave differently along (or within) these bord- 
ers than when operating away from the borders. Further, when the postural 
system adapts from one 'support condition to another (e.g., from long to short 
platform lengths) the implication is that the border layout Itself shifts cor- 
respondingly. Let us focus on the "simpler" adapted case (e.g., repeated tri- 
als with short platforms) for which border structure is assumed to be rela- 
tively constant. In this instance, the control structures asi^ciated with ad- 
Jac#»nt conf igurational domains should compete equally at the borders for ac- 
cess to the final comfflon paths of muscular output. There are at least four 
possible outcomes of such competition: a) opposing effects will cancel each 
other and no muscle activity will occur; b) competing synergies will be ob- 
served simultaneously in a mixture of EMG patterns; c) there will be a repet- 
itive alternation or "jittering" between the EMG patterns of each competing 
synergy; or d) a totally novel EMG pattern might be observed. Experimental 
demonstration of any of these "patterns near Nashner and McCollura's hypothe- 
sized synergy borders in support-condition-adapted subjects would provide 
strong support for the discrete model, since the control-law model would not 
behave differently on, near, or away from those borders. These data are lack- 
ing, however, or at least have not been, presented in the target article. The 
strongest dat^i offered by Nashner aqd McCoilum in favor of their hypothesis is 
the sequentia] mixing of ankle and hip "synengies" during adaptation to 
sud'ienly changed platform sizes (see Nashner & McCollum's Figure 7). However, 
these findings seem equivocal at best given the concomitant shifts In border 
structure that presumably accompany such adaptation. Therefore, perturbation 
studies that use adapted subjects and that explore a sufficiently large sample 
of the postural space could (i) help to identify synergy borders and (iij 
constitute a direct test of the discrete synergy modol. 

The above suggestion exemplifie^3 a general experimental strategy for 
explicating the cooperative behavior of mul ti component , open, nonlinear, sys- 
tems. A common feature of all such systems is that when control paraaeters 

I 
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are changed beyond certain critical values, new "modes'* or spatioteoporal pat- 
terns nay appear (for «any exMtples in phyAios, chealatry, and biology see 
Mtmn, W7, m$f >rtgofifl^ if9««n Wes, t982i Tat«s » Iberall, W3j 
for examples in motor behavy, ^ee Cohen, Holmes, & Rand, 1982; Kelso & 
Tuller, 198^1; von Hoist, 197f). The beauty of this formulation is that the 
ttodes (e.g., synergiffiyc patterns) say themselves be described by a set of 
dynamical equations <m*ived via transformation procedures from the equations 
describing the behaviqr of the original subsystems (e.g., muscle elements). 
Under the influence of continuous scaling of control parameters, a previously 
quiescent mode may suddenly become dominant and "capture" the behavior of the 
subsystems. Such bifurcations result from the competition, as it were, be- 
tween the "forces" or Inputs that are systematically scaled (corresponding, 
for example, to the direction of platform translation), and the "forces" hold- 
ing the system together (i.e., the synergistic constraints among muscles). 

r In Fl.^ure 1 we show an example from our own work on cyclic behavior in a 
piararoetrically, scaled bimanual movement system exhibiting such a bifurcation. 
In the figure, the displacement-time profiles of left and right hands are 
plotted against each other on the Lissajous plane.' Here the phase relation 
between the movements of the right and left hands describes the spat lo temporal 
ordering among corresponding flexor and extensor muscle activities. Starting 
in the antiphase modal pattern (I.e., right flexion [extension] Is accompanied 
by left extension [flexion]), the parameter of movement frequency is 
voluntarily increased in a continuous manner. As the frequency increases, the 
antiphase mode becomes less stable as exemplified by the increase in phase 
variance. At a critical value (which turns out to be a dimensionless function 
of each individual's preferred cycling rate), the system bifurcates and a dif- 
ferent, in-phase, modal pattern appears (for a more complete analysis, see 
Kelso, 1984), Extrapolating the above concepts to the postural domain of 
Nashner and McCollum, we envisage one "discrete strategy" as giving way to an- 
other at critical borders in the postural parameter space. 

* 

Several points, therefore, are pertinent to Nashner and McCollum's analy- 
sis. First, transitions from one synergistic pattern of muscle elements to 
another may be discontinuous even though the factors C'-'ntrolllng the process 
can change continuously. Second, discontinuities of mi.scular pattern (giving 
rise to a description with apparently discrete prope' vies) are observed not 
because there are no intervening behavioral states, but because none of them 
is stable (see possible experimental outcomes above). Thus, there may be a 
large number of ways for a system to exhibit continuous change but only a 
sm^ll number of ways for it to change discontinuously. To conclude, there- 
fore, thai discrete logical control is imposed upon a continuous mechanical 
system may not be warranted. Rather, synergistic muscular activities myy 
^^erse as modal patterns from appropriately scaled neuromuscular dynamical 
systems. Finally, although discrete logical states could be used to represent 
distinct modal patterns, it should be recognized that much of this apparent 
discreteness reflects the larger time constants of the dominant modes relative 
to the time constants of the subsystems. With reference to postural* control , 
the synergistic patterning among muscles appropriate to a given region of the 
associated parameter space is defined over longer time spans than, say, those 
Involved in motor unit recruitment. Thu&, the discrpte-logical vs, oontlnu- 
Qoa-dynMaicak distinction drawn by Nashner and HeCollum may be more apparent 
than real. 
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Figure 1. Upper left box shows angular pos: tion-time profiles of left (top) and right (bottOTi) 
hands. In the remaining plots, the left (x-axis) and right (y-axis) hands are on the 
Llssajous. plane (A-E), ♦♦Hands out of phase" means that flexion of one hand is accoBpanied 
by extension of the other and vice versa. "Hands in phase" means that both hands flex and 
extend at about the same time. Phase becomes less stable (IC) as evident in the widening 
* of the Lissajoua trajectory, until an abrupt transition occurs (tD). Hand poaitionc on 
all plots are dlapliyed^ln arbitrary units (from Kelso h Tullar, t98^). 
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REPETITION AND COMPREHENSION OF SPOKEN SENTENCES BY READING-DISABLED CHILDREN* 

I* 

r 

Donald Shankweiler,t Suzanne T. Smith, t and Virginia A. Manntt 



Abstract . The language problems of reading-disabled elementary 
school children are not confined to written language alone. These 
Children often exhibit problems of or<lered recall of verbal nateri- 
als that are equally severe whether the materials are presented in 
printed or in spoken form. Sentences that pose problems of pronoun 
reference might be expected to blace a special burden on short-term 
memory because close grammatical relationships obtain between words 
that are distant from one another. With this logic in mind, 
third-grade children with specific reading disability and classmates 
matched for age and IQ were tested on fiv.e sentence types, each of 
which posed a problem in assigning pronoun reference. On one occa- 
sion, the children were tested far comprehension of the sentences by 
a forced-Choice picture verification task. On a later occasion they 
received the same sentences as a repetition test. Good and poor 
readers differed significantly in immediate recall of the reflexive 
sentences, but not in comprehension of them as assessed by picture 
choice. It is suggested that the pictures provided cues that light- 
ened the memory load, a possibility that could explain why the poor 
readers were not demonstrably inferior in comprehension of the sen- 
tences even though they made significantly more errors than the good 
readers in recalling them. 

The problerat5 of many children who are deficient in reading skills are not 
confined to reading and writing, but extend to abilities involving spoken lan- 
guage as well. Characteristically, the language tasks on which poor readers 
are deficient place a burden on verbal short-term memory. For example, tasks 
which require retention of spoken letter names (Shankweiler , Liherman, Mark, 
Fowler, & Fischer, 197y) word strvhgs and sentences (Mann, Liberman, i Shank- 
weiler, 1980) have consistently distinguished poor readers in the t^arly school 
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years from their peers who are good readers. That the B»mory prjoblems of the 
poor readers are language-related is evident from thjfr fact that they typically 
perform at a level equivalent to good readers on ta^s that involve memory for 
nonllnguistlc material such as photographs of faces (Llberman, Mann, Shank- 
weiler» & Werfelman, 1982), visual nonsense designs '(Katz, Shankweller, & 
Llberman, 1981 ; Llberman et al.,'l982), and visual-spatial sequences (Mann & 
Liberman, in press)* 

The purpose of the research we describe h^re was to Investigate the 
abilities of third-grade children who differ in re^adlng ability to repeat and 
to comprehend a variety of spoken sentences. Our intent was to explore a 
possibility that arises from our earlier research (Shankweller et al., t979; 
Mann et al,, 1980): that the limitation of verbal s^wrt-term memory, which is 
found to be characteristic of children with reading disability, may be 
associated with difficulty in spoken sentence comprehension. The expectation 
that this association would be found was motivated by a consideration of the 
ne^d for an effective working memory during sentence processing. We assume 
that a system must exist for holding the words of a sentence and their order 
of occurrence in some kind of temporary store until the sentence structure can 
be apprehended. This would follow from the fact that the meaning of a sen- 
tence is not merely the sura of the meanings of the separate words it contains, 
but is derived from the relations between the component words that determine 
its syntactic and semantic structure. Given poor readers' problems in 
remembering ordered sequences . of words, they might be expected to make 
mistakes in sentence processing whenever they are confronted with sentences 
that place the working memory system under stress. 



In addition to the sheer number of words a sentence contains, its lexical 
content and manner ^f, construction can be expected to affect how severely the 
working memory is taxed in proces-sing it. Sentences with unpredictable or ar- 
bitrary semantic content may place a heavy load on working memory because they 
force the listener to process them fully and perhaps more than once in order 
to extract the content. The Token Test of De Renzi and Vignolo (1962) con- 
tains such structures. This clinical diagnostic test, well-known to students 
of aphasia, consists of sentence "commands'* that request the subject to per- 
form arbitrary manipulations of the token obJ<^cts. We have found a shortened 
version of the Token Test (De Renzi i Faglloni, 1978) to distinguish groups of 
good and poor readers in the third grade, but only on the complex structures 
in the final sections of the test (Smith, Mann, 4 Shankweller, in prepara- 
tion) . 



Since most of th^' Token Test items were 1 n.suf f ic iently difficult to sepa- 
rate the good and poor readers, we sought Jto develop a sentence test that 
would be at once more sensitive' arid more analytic. The new measures were de- 
signed tu discover whether poor readers are selectively "Impaired in coping 
with npeciflo types of constructions that stress working memory more by their 
syntactic form than by- their semantic content. Frequently, close grammatical 
relationships obtain between words that are ..distant from one another in the 
string, H3 in some relative cl.iu:^'; sentence;? in which the logical sutjject is 
separat^'d from its pronominal ref.':rent by a span of words. Sentences of this 
form should be very difficult, to ('Mr-pr^'h'-n-J if th'»r«' i.s f n.-c'Mjr-ir.e rf't^-nt 1 
Of the word string. - ' 
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We conducted two additional studied with the sane groups of good and poor 
rea4er9 fcrtw ha^ received the Token Teat, In pianning these stu41«« ve aousDt 
guidance both from the literature on acquisition of syntax by normal children 
and from studies of sentence comprehension by adults with acquired aphasia. 
In our first study (Mann, Shankweller, & Smith, in press) we examined sen- 
tences with relative clause structures in which we varied the point of attach- 
ment of the relative clause to tl e main clause. We found that^ the poor read- 
ers made more errors than the good readers on each of four sentence types, but, 
when the four types were ranked in order of difficulty for good and poor read- 
ers separateljy, the ordering was the same for both groups. The finding that 
the poor readjers were generally worse in comprehension of relative clause sen- 
tences, but within this broad class, were affected by syntactic variations in 
the same way that the good^gfeaders were, suggests that efficiency of working 
memory, and not differentiae grasp of syntactic structure, is the characteris- 
tic on which the groups are roost readily distinguished. 

Thus, the data from stud<f»s of sentence memory, the Token Test, and 
comprehension of relative clause tructures are consistent with the possibili- 
ty that poor readers have deficiencies in sentence processing that are an 
expression of their difficulties in retaining verbal material in working memo- 
ry. However, we cannot exclude the possibility that other linguistic 
deficiencies are present in these children..' Although our research to date has 
not Identified any constructions on which poor readers are selectively im- 
paired, we have found that fuch children usually make more errors in sentence 
processing than good readers of comparable age and IQ (Mann et al., in press). 
Poor readers' failures to process sentence materials accurately could reflect 
memory limitations primarily, as we have suggested, or alternatively, such 
failures could be symptoms of delayed acquisition of portions of the grammar, 
as Byrne (1981) has proposed. The possibility that poor readers may have pri- 
mary syntactic deficits deserves thorough systematic study in which a variety 
of syntactic structures is examined. 

The study wc describe here begins to address this need. It focuses on 
attribution of reference in sentences containing a reflexive pronoun. Our 
reasons for selecting this problem from among the many possibilities for ap- 
proaching sentence comprehension were two. First, pronoun reference is tight- 
ly governed by syntactic constraints. Since correct attribution of corefer- 
cnce of a reflexive pronoun requires that the peroeiver recover the syntactic 
3tru<rture of tne whole sentence, comprehension of pronoun reference is a test 
of sensitivity to grammatical structure. Second, . there is evidence that apha- 
sia 'in adults is often aosociated with problems in assigning reference to 
reflexive pronouns. Our study was inspired by an Investigation of comprehen- 
sion of the reflexive by Biumstoin, Goodglass, Statlender, and Biber (1983). 
Theoe investigators compared comprehension of sentences in which a reflexive 
pronoun is coreferent to an immediately preceding' noun phrase, with that of 
sentences in which the reflexive is coreferent to a noun phrase that occurred 
earlier in the sentence. Examples la and b Illustrate these types; 

la ^e chef watched the boy bandage himself, 
^ lb The chef watching the boy bandaged himself. 

Using a.^ two-choice picture-verification task to probe subjects' comprehension 
of the coreferent of the reflexive in sentences such as la and lb, Blumstein 
et al. (1983) found that all aphaslo subgroups performed better on la than on 
lb. Indeed, they perfome^ at chance on sentences like lb, that cannot be 

1,90 
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successfully conqjrehended by adherence to a processing strategy In which 
noun reference is inflexibly attributed to the nearest preceding noun pftrL 
fmwt Blu«iteln et al. (1983) conclud«d that the aphaslc subjects failed to 
process fully the syntactic structure of sentences like la and lb, and that 
■they apparently had a tendency to revert to the iniroature "minimum distance" 
strategy often attributed to young children (Chomsky, 1969). 

Further motivation for our decision to examine children's comprehension 
of constructions containing reflexive pronouns came from studies that specifi- 
cally examined developmenUl changes in pronoun coinprehens ion. Solan (1981) 
has shown that children of age five or younger recognize the basic constraints 
on reflexive pronouns. It must be acknowledged, however, that young children 
do make mistakes in processing pronouns. We note in this connection findings 
of Read and Hare (1979), who suggest that certain nuances of pronoun use, 
which turn on the correct parsing of sentences involving more than o^ie clause, 
may be late to mature. Among a group of children aged six to twelve studied 
by these investigators, or.iy the oldest subjects in the san9)le gave graranati- 
cally correct interpretations to all types of multlclause constructions that 
incorporated reflexive pronouns, and even the most successful wore not as con- 
sistent as adult subjects. Thus, although cnildren may very early apprehend 
constraints on pronoun reference, considerable individual variation in 
sophistication in handling reflexive pronouns in multiclause structures seems 
to exist, giving ample scope for differences between good and poor readers at 
the third-grade level, ^ 

Attribution of pronoun reference seemed, then, to be an important area 
for further investigation. Accordingly, our study was designed to assess 
comprehension and immediate recall of sentences containing pronouns. 
Third-grade children who were go<5d and poor readers were first tested for sen- 
tence conir hension by a picture verification test; in a subsequent session 
on a difrt the sarm» sentences were presented for immediate recall. 



Method 



Subjectn 



The subjects were 3'j third-gr-dde children atterjding the public school 
system of a small rjort^^oautcrn city. All wer'e native speakers of English with 
no Known speech or m-riring def ic ionc ic-G , who h-^.l an i nte 11 igt-nc..- qu-jL.'"^ 
90 or better, as measured by the Peabo»ly Pictur*; Vocabulary Tcj,; ■';(•.'■.). 
Their inclusion in the experiment was Initially based on teacher^' evaluations 
of reading ability, and conrirmed by sc-o^-es on th^- rciidiuK subf/.-jt of V.u.! lowu 
Test of B.i3ic Skiilr, ( Hiorony ,tiU3 \ Lindquiyt, 1'^78), whi<-n had be-n a.iminis- 
tered approximately four inontri:: before- our rtudy, Thre<; boys an'i" rirtecr; 
girls wh05«,' mean lowa ijrade-oquiva ierit fjcore was ').sv ( ran(^^' = k,) tn ''^,:>) 
comprisf..! the good rcaJ^r group; nine boy:-. a;i.! -ifrht ^ ir io' who.,./ ;r;f^..r, Iowa 
greide-eguivdlent score was ^.32 (range - Uf to ^.b) comprised the poor reader 
group.' The groupa Jid nut differ sign if jc^ntly in iw for good rcadcrii 

and 107.7 for po'jr readers;, nor in a^..- (no.-- m^-i.lh.s fwf ^t>od r-f-m^-rs; VJ'^•'^ 
moritf:r7 for poor- reader-3). 
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Materials 



The test materials (see Appendix) consisted of eight tokens of each of 
five sentence types: Each ^ser^nce poses a problem in perception of pronoun 
reference. A sample set appears below: 

A) The fireman watched the soldier bandage himself. 

B) The fireman watching the soldier bandaged himself. 

C) The fireman bandaged her. 

D) The soldier bandaged himself. 

E) The soldier bandaged him. 

Type A sentences are declarative sentences in which the reflexive pronoun 
occurs in a relative clause modifying the object of the main clause, thus 
causing the referent of the reflexive to be the object of the main clause. 
The pronoun reference can be correctly assigned following the minimum distance 
principle, since the pronominal referent is the agent immediately preceding 
the reflexive pronoun. Type B sentences are declarative sentences with a sin- 
gle, center-embedded, relative clause that modifies the subject of the main 
clause, thus causing the referent of the reflexive pronoun to be the subject 
'Of the main clause. In contrast to Type A sentences, the referent of the pro- 
noun in type B sentences cannot be correctly assigned by following the minimum 
distance strategy, since it io the agent most remote from the reflexive. 

The remaining three types of sentences were controls designed to assess 
comprehension of personal and reflexive pr-onouns in single-clause sentences. 
Type C sentences teGt«'d the comprehension of per3on;jl pronouns, incorjporat ing 
gender difference as a cue for establishing reference. Typo-? D and E tested 
comprehension of reflpxive and personal pronourus, respectively, without the 
gender cue. 

Eight sentences of. each type were constructed using noun agents that can 
be unequivocally represented and verbs that refer tc actions that can be 
illustrated clearly in drawings. Half of the sentence sets employed male 
agents and half, employed female agents, with Type C sentences incorporating 
agents of different st-xys. The HO tost aontenocH were randomized and recorded 
by a speaker who read each one aloud with natural intoji'dtion. Each yontence 
was preceded by an alerting stimulus (a bell). 



The tape for the repetition task was rec-crjifj iv,•p,lr.lt^• ly . Ii Included 
the original sentences of the coraprenensiofi t.-st interspersed witn an addi- 
tional eight control ncntcnces. These control si/nteru'^-s equal lea or slightly 
exceeded the length of Type A and y sentences anJ incorporated the :;ame agt^nts 
and actions, but iac;<ed reflex ivt- pronc-uns. Kacn was of the form "Ide nurse 
and the policewoman spr^yid water on the fiow<rrs." (see Appendix;. 

ricture-verif icatlon test ; In ufder to aiiseys tho aDiJ^ty of -juhjecty to 
comprehend the reflexive pronour. each type of conjt ru. , w*- w/'cdled a 
four-alternative, forced-choice picture venUcatio tank in which suDjecta 
were presented with a twe-by-twd ^rray of line drawings at^6 were anked to 
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point to the drawing that aost accurately depicted the weaning of the sentence 
m heard, The r«Bp«uM ati^y for eao»i sentence included foir 5 x 3 j/n Inch 
pictures, one correctly depicting "entence meaning, and three foils, '^ch 
depicting an incorrect ir^-erpretation of the sentence. Each picture displayed 
two agents; th«# placer of the agents remained constant within an array, 
and was varied randoml> ross arrays. The position of the correct picture 
and the three different foils was varied so that each appeared with equal fre- 
xjuency in each of the four -)S3lble positions within the array. 

. The foils for sentence Types A and B provided the critical measures. 
Foil V for Type A sentences depicted the reflexive pronoun contained In the 
subordinate clause as Incorrec tly at tr ibuted to the subject of the main 
clause. Foil 1 for Type B sen%ices correctly depicted the actions expressed' 
by each verb, but depicted the reflexive as incorrectly attrib**te^-te the ob- 
ject of the subordinate clause, this foil provided the test of whether sub- 
jects were following a mlnlmuBj distance strategy, an assignment that was char- 
acteristic of adult aphasics studied by Blumsteln et al. (1983). Foil 2 for 
both Type A and B sentences allowed a test of whether the subject had attended 
to tne entire sentence. This foil depicted the correct attribution of the 
reflexive to it- referent, but incompletely represented the relation between 
the agents inCated by the first verb. For exanple, in sentence A (see 
above), the nurse is not watching the policewocaan, and in B, the policewoman 
watching the nurse. Foil 3 for A and B sentences allowed the reflexive 
pronoun to be 'interpreted as a personal pronoun. 

Foils for the control sentences (C, D, and E) were as follows: Foil 1 
depicted reversed roles of the two noun agents. Foil 2 depicted the pronoun 
incorrectly— i.e. , personal pronouns 'in Type C and E sentences were depicted 
as reflexive pronouns; reflexive pronouns in Type D sentences were pictured 
as personal pronouns. Foil 3 depicted a role reversal and misrepresented the 
pronoun as described above. 

Procedure 

Subjects were tested individually in two half-hour sessions. The 
coBprehension test was administered first followed by the repetition test at 
least one week later. When testing comprehension, the examiner placed the 
relevant array of pictures before the subject iranedlately prior to the initia- 
tion of each tape-recorded sentence. The decision to expose the picture array 
before sentence onset wis dictated by a concern not to overload short-term 
memory. Subjects were instructed to listen to the whole sentence, to examine 
each of the four pictures, and then to point to the one that best showed what 
the sentence meant. Emphasis was placed upon listening to the entire sentence 
before pointing, and choosing the picture only after examining all of the al- 
ternatives. A bell signalled the onset of each test sentence, if a subject 
requested that a sentence be repeated, the experimenter replayed the sentence 
once, noting the repetition on the score sheet. 

In the sentence repetition task, subjects were instructed to listen to 
each taped sentence and to repeat_ it ijack- immediately, -fistch sentence was 
played only a single time. If a child requested that a sentence be repeated, 
the examiner encouraged him to report as much as could be remembered. The re- 
sponses were transcribed by the experimenter during the session^ and also pre- 
served on tape for later error analysis* 
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^ Results 
Sentence Repetition 

The repetition daU vere analyzed both in terms of the nuntoer of 
incorrectly recalled sentences, and. in terms of tne total nuntoer of individual 
/ errors made, including omissions, subati tut ions, reversals, tense changes, and 
pronoun errors within each' sentence. The results of each scoring procedure 
are suamarized in Table 1 for each type of sentence (the five test types A-E 
and the additional control type), separately for good and poor readers. 



Table 1 - - — - - 

Sentence Repetition: Mean nutriber of sentences incorrectly recalled (nax-S) 
and mean number of words incorrectly recalled in sentences of each type 

Sen tenc es 

Sentence Type Reader Group 





Good 


Poor 




(N-18) 


(N-17) 




Mean (SD) 


Mean (SD) 


A 


2.22 .55) 


3.^1 (1 .70) 


B 


2.06 (2.01) 


3.S2 (2.19) 


C 


0.39 (0.92) 


1.00 (0.79) 


D . 


0.22 (0.55) 


1 .23 (1 .09) 


E 


0.11 (0.32) 


0.88 (0.99) 


Control 


2.00 (1.68) 


2.70 XI .83) 



Words 

A 3.06 (2.31) ^.9H (2.33) 

B 3.89 (5.26) 7.35 il M) 

C 0.39 (0.98) 1 .00 (0.79) 

D 0.22 (0.55) 1 M (1 .28)*» 

E 0.11 (0.32) 1.06 (1.25) 

Control 3.33 (3.27) 5.117 (K.39) 



Poor readers made more errors than good readers on both the nuofcer of 
sentences and the nunfcer of worda to be recalled. Pearson product-moment 
correlation coefficients were computed for each error measure and the reading 
scores from the Iowa test. Each was negatively correlated with reading abili- 
ty: r(35) - -.48, £ < .01 for sentences; r(35) - -.^5, p < .01 for words. 
Each set of error measures was also subjected to an analysis of variance in 
_ _ »^4ch_ type_j[if..afiatence_< Types- A - E and -the contror senTehcesK was " the"'with- ^ 
r in-subjects factor and reading group the between-aubjects factor. Significant 
main effects were obtained for type pf aenUnce, both for nuiAer of sentenoea 
Incorrectly recalled, F(5. 165) - 37.81 , £ < .001 and nuaber of words, 
£(5,165) • 21.97, £ < .001. The effects of reader group were also signif- 
icant; F(l,33) - 8.80. £ < .006 for sentences, F<1,33) - 6.*»0, p < .017S for 
words. How«ver, there wa« no interaction between reading ability and the ef- 




Table 2 

Distribution of repetition errors according to word class, and error type 

(Mean nuiriDer errors per subject) 



ERROR TYPE 



RfiADER Gf^DUP: ' 

/ 

UOHD CLApS 
llouns/ 
Varbd 

Article 
Pfep./Conj . 

CEMT 

ERRORS: 




Substitution 



Deletion 



' Intrusion 



Inflection^ 




readers: N*18; poor re^(ier»: N«17 



^Itot applicable for articles, pronouns, prepositions, and Gor\)unctions 



PERCENT OF 
TOTAL ERRORS 



C3ood 


Poor 


Good 


Poor 


Good 


Poor 


Good 


Poor 


Good 


Poor 




6.06 


0,55 


1 .23 


0.00 


0.06 


0.39 


0.12 


39.8 


35.2 


0.83 


1.17 


0.28 


0.53 


0.00 


0.1? 


1 .89 


3.^7 


27.3 


2^ .| 


1 .00 


3.76 


0.55 


0.^<1 


0.17 


0,29 


NA 


NA 


15.6 


21 .0 


1 ,00 


'l..| 


0.55js 


1.53 


0.00 


0.00 


NA 


NA 


l^i.l 


1^4 ,2 


0.1 1 


0.35 




0.23 


0,06 


0.41 


' NA 


NA 


3.1 




58.0 


60. i* 


19.1 


18.5 


2.1 • 




20.7 


16.9 
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i'Act Of ;mt«i»e type, fw otiildren in reading groMp«( MorePerfor* were 
wde on Typ^^ 15 senteiKNHi sun 1 en tt^HnicfNHf cofHt*ol sentences , ttian isn 
Types C D, and E, t(33) - 6.87, p < .001. 

, Table 2 displays the distribution of errors for each reader group accord- 
ing to error type and word class. The greatest proportion of errors for both 
reader groups occurred on nouns and verbs. Substitutions within word class, 
e.g., saying a "a" for •♦the," "fireaan" for "farmer," "hisself" for "hioself," 
sake up the greatest proportion of errors for both reader ^ups. The propor- 
tion of deletion errors (deletion of whole words) and errors involving inflec- 
tions (e.g., ODission of the possessive "s"; omission or change of verb tense 

— f^rR^IA) waa fiOflpar able .for eafih group. — intrusions, — 1^4^^ — Ing^rrtlnp flxtra 

words into a sentence,' occurred rarely. It la apparent fron Table 2 that al- 
though the poor readers nade acre errors than the good readers in most error 

„ ca tegories. tne.. distribution of the errora- if -highly -^l«H.ar- in the two 

groups. 

Sentence Ccroprehension 

* 

Having established that the poor readers were less accurate in verbatim 
'repetition of the test sentences, we turned next to the results of the measure 
of sentence cooprehension, the four-choice picture verification test. The 
initial analysis was performed on the number of error responses made on each 
sentence type (A-E). The correlation between total errors and the Iowa score 
yielded a nonsignificant value of r(35) - -.1^. Analysis of variance for the 
factors sentence type and reader group revealed a highly significant effect of 
sentence type, F(ii,l32)-38.06, p < .001, but no significant difference between 
children in the~two reading groups, F(1 ,33)-0.'<0. Moreover, there was no 
interaction between individual sentence type and reader group, FCijISS)-! .53. 

Table 3 shows a breakdown of the errors by sentence "type and serves to 
confirm the absence of Interaction between the reading groups, I* may be seen 
that many more errors occurred on sentences A, B, and E, than on C and D. The 
difference between A and p on the one* hand, and C and D, on the other, was 
expected. Tho comparatively high error rate on Type E may have occurred for a ' 
special reason.' 

A detailed analysis of the error pattern was undertaken In which choice 
of foils %ias examined for the critical Sentence types A and B, which were de- 
signed 'to Indicate whether poor readers tend to adopt a minimum distance 
strategy in assigning a referent to the reflexive pronoun. An analysis of 
variance was performed on this portion of the error data, in which the factors 
were sentence type, foil type, and reader group. There was a significant cf-- 
feet of sentence type, F{1 ,33)-31 .53f £ < .poi, and foil type, F(2,66)«l>.6'<, £ 
< .02. Moreover, th#re was an interaction of foil type and reading ability, 
r (2,66)-^. 02, £ < .03. However, there was no interaction of Toil type x sen-' 
twioe type x reading ability. .1 

The-di stribution o f errors a cr oa g - the riwn fo r Typi * * a nr l B a n n tences is 

Shown in Table ^. The figimes irf*thls Uble are a breakdown of the error 
«e«as aimta In Table 3 aooordiiig to foil type. Foil 1 in Type B 0m%mc**a 
provided the critical test of adherence to the minimum distance principle. 
Choice of this foil would indicate that in the assignment of pronoun 
^ r€{9tmHi^ ttos subject is uaiog a minimum distance strategy lieu^of full 
iqr^aetio onairsis* fHi9 vas the omr that aphasio pattants, stud tad by 



Its 



S»irakt«tU*r ft •It.t S»fit«no« ftopftition And 0(Mpr«i»M»lon 



— ) 



TaWe 3 

S«nteAe« Coir|>r«hensibn: Mean nuiriber and percent of errwa on s^itences of 
each. type (aax'S) 



Sentence Type 



Good 
(N-18) 



Reader Q'oty 



Poor 
(N-17) 



Wuan (S B ) 



Percent 



Mean (SOI 



Percent 



A 
B 

C 
D 
E 



r.56 (0.92) 

%^^ (2.08) 

0.78 (0.65) 
0.50 (0.71) 
2.55 (2.12) 



18.35 
36.59 
9.18 
5.88 
30.00 



1.35 (1 .00) 

3.88 (2.im) 

0.35 (0.70) 

0.59 (0.71) 

3.29 (1.79) 



1*1.27 
1*1.01 
3.70 

6.23 
3^.79 



Table i> 

r 

Diatrlbution of Errors by Foil Type for Sentence Types A and B: Mean nuntoer 
errors 



Foil Type 



Reader 



Good 
(N-18) 



oup 



Poor 
17) 



(N 



Sentence Type A 



Sentence Type B 



1 

2 
3 

1 
2 
3 



Mean (SD) 

0.00 (0.00) 

0.56 (0.70) 

1.00 (0.6ft) 

1.50 (1.85) 
p. 78 (0.73) 
0.83 (0.99) 



Mean 
0.18 
0.29 
0.88 

2.82 
O.ill 



(SD) 
(0.39) 
(0.8*0 
(0.78) 

(2.76) 
(0.62) 



0.65 (0.70) 
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BItiMt«i& tt ax. t^Wt, tanded to aaka. Th» aubjacU <^tha prwHrnt aUKly 
^is^' iiliwah a twidwiu y to www wy neww/ tlMtt ir, tefft»a to aj»sig(r t]t« 
rafaranoa to tha agant In clodtat proxlnlty rathar than tix^ the ra?arant 
dlctateQ ty tha ayntax. Howevar, alttiousti the poor r9A^9 aa^cted Foil 1 
mote fraqjantly than good readers, tha diffaranca tfaa not conf ina^ to Typa B 
aentancas, aa Indicated by the lack of a three-way interaction anonKsmtence 
type, foil type, and reader -sroup. The poor readers tended instead \o nake 
•ore errors on Type l foila for all sentence types, t(33)-1. 92, £ <\.05, 
suggesting that their difficulty cannot be understood as an inordi^te 
reliance on the aininuia distance strategy. Had thia been the case, the p^r 
reade^^s i^ould not have aade oore Foil 1 errors than tne good readers on Tyj 
A aentences* in which Foil 1— in violation of the niniaun diaUnce 
princtp^le — incorrectl y attribut e d th e r e flaxi r e to t he subj a ut - of the iicrtn 
clause. As tor the other foils, any differences between good and poor readers 

failed— tft ceacjL, signific ance. Selec tiim.--Of-£oil 2 , w h ioh-eontro ^l e d . f or 

inattention to the first verb of the aent^e in both Type A and B s^tances, 
occurred only rarely* in either sentence type. Foil 3, which depicted the 
reflexive pronoun as a personal pronoun in both sentence types, was selected 
slightly more frwjuently, but differences between reader 'groups were nininal. 

Selection of foils on the control sentences (C, D, and E) also showed *no 
reader group differences. The few errors that occurred on Type C and D aen- 
tences, involved primarily Foil 2, that Is. treating a personal pronoun as a 
reflexive, or vice versa. As we mentioned earlier, somewhat more errors oc- 
curred on Type E sentences. These errors predominantly Involved personal 
pronouns in locative constructions (Sets ^ and 6 in Appendix) and indirect ob- 
ject constructions (Sets 2, 5, and 8 in Appendix) having been misinterpreted 
as reflexive pronouns (choice of Foil 2). Such ojisinterpretations are common 
to many young children and may reflect a tendency to "flatten" embedded struc- 
tures (Tavakollan, 1981). 

4 

Discussion 

This study was undertaken as part of a continuing Investigation of the 
nature of language impairment in children who fail to make expected prepress 
in learning to read. Here we have asked v^ether poor readers* problems with 
language extend to the processing of nailtlclause spoken sentences involving 
attribution of pronoun reference^ To this end we have tested good and poor 
readers* repetition antL. comprehension of the same set of sentences. 

With respect to repetition, more errors occurred on the longer, cooiplex 
sentences. Structural differences between sentences matched for length were 
not significantly reflected in error rates, although fewer colors tended to 
occur on sentences that could be interpreted by following the rainiraufn distance 
principle (Type A). The poor readers overall were less accurate than the gocd 
readers in repeating aentances of every type. Sentence type did not signif- 
icantly affact the extent of differences related to reading ability when the 
data are examined for miRrt>er of correct rasponses and for. the pattern of er- 
- ws. . T h U fs fn kpftniny with a findin g w e reported earlier (Mann e t al., 
1980)' in which it was demonstrated that good and poor readers^ aimiiar to tne 
prt»9at autUaoU, Uiou«h 1 year younssr, dif farad Markedly in raoall^f both 
meaningful and aeaningle«>3 sentences, but the differences were constant across 
a variety of sentence structures. The results of both studies are consistent 
MitH tMi mtatf Uam q{ avids^ mat inplie^te working .mmory in the Ian* 
' defioita iwr poor Ka»d a ra , * 
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Ttm tMt fCH* i»iKii^p«^tmiJ^ of U» amtmqem the picture verl float Icnri 
tkitfc revealed appreciably aore errdra on coaplex sentences than on sinple 
Mee. The errors were confined chiefly to aultiolause constructions and to 
the specific locative and Indirect object structures that have been identified 
by other investigators as sources of potential confusion in young children 
(e.g.» Read & Hare, 1979; Roeper, 1982? Solan, 1981). The ooaparison of 
greatest interest, between sentences ,that can be interpreted by following the 
Biniamn dlstMce principle (Type A) and those that cannot (Type B), revealed 
that significantly more errors occurred on the latter, suggesting that the 
children in our study resorted occasionally to inaature parsing strategies. 
Unlike the repetition test, however, the picture verification test of 
cooiprehension did not significantly di stinguish the good and poor readers, 
Sueh-dif f ieulties as the amUjeuts d iq eft66unter were conaon to both grc^ps of 
chUdnen. The chiXdren*e-'d4ff ieu lti e s w i th ttw m or^ coTiylgx striicrtarey-irerB 
alnor in coaparison to the problems that the aphaslc patients of Blumstein et 
al. (1983) encountered with similar sentences. The aphasics performed at 
chance level on all sentences in which the structure did not allow application 
of the minimum distance principle, and, indeed, they failed to interpret 
reflexive pronouns correctly even in single sentences. 

Though these results did not reveal the expected differences between the 
good and poor readers in comprehension of coo|>lex sentences containing 
reflexive pronouns, we roust acknowledge, and take account of, , other indica- 
tions that our good and poor readers are not wholly equivalent in their 
abilities to comprehend spoken sentences. First, we, sho^d note that the 
children in our two reading groups did not po'form equivalently on the reading 
subtest of this Iowa Test of Basic Skills. The inferior performance of the 
poor reader group on this test of reading comprehension does not necessarily 
Indicate language processing limitations as such; it may instead reflect 
limitations that are specific to written language, such as slow and Inaccurate 
word decoding. By studying coaq>rehension of spoken sentences, we hoped to 
gain a perspective on possible language coiqprehension limitations, independent 
of specific reading difficulties. In this 'connection, it is appropriate to 
refer to a companion study to the present one in which we tested the same 
Ig^roups of subjects on a different occasion with a different set of ^teflces 
(Hann, Shankweiler, & Smith, in press). In that study, unlike the present 
study, the poor readers displayed a significant deficit in con^rehenaion. 
There, the method of testing was by object manipulalHom, not picture verifica- 
tion. Thus the answer to the question of- whether^ the 'pcor readers are below 
par on comprehension may depend on which structures are assessed and on the 
method of testing. , 

Little information is presently available about tha capabilities of good 
and poor readers to comprehend various types of sentences* A recent study by 
Byrne (198l), which came to our attention after this "experiment and-trre one of 
Mann et al. trare completed, also finds differences in sentence comprehension 
(as tested by object manipulation) on som sentence types but not on others. 
The sentences that separated the reader groups in Byrne's study contained 
unusual constructions and semantic anomalies. Having found that some shorter 
se n tft ho iw dlatlajpdghed the reade r g|^ups more readily than longer ones, Byrne 
argued that mfcmory faotors could not be responsl^lf for the differences. This 
conclusion does not necessarily follow. As we noted earlier, more is involved 
In mmmory-related difficulty than sentence length alone. Anomalous sentences, 
9imM thmir ar« short; my pUe« «xtr«»tos«n itaiBids {m~wii« 
catrs^tJwy are likely to be misinterpreted on first eonstrtral and therefore 
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mmx to te "r^)!*)^ frcp it n o ry , in order te estftbJLiflh their atrweture prop- 
erly. Such rehearsal would require complete retention. 

In r^i^rd to the method of testing, we nay speculate that the ploture 
verification task of the present study may have stressed short-term «e«ory 
less than the "acting out" Msnipiaation task of Mann et al, (in press). It Is 
pertinent that In the present expiH*i«ent, the subjects were allowed to inspect 
the ^eet containing the foir oultiple-cholce picture Toils as the s«itence 
was being read, a procedure that could be expected, to nlnlmize the need for 
r^arsal* In contrast, the nanlpulation procedure of the Mann et al. study 
■erely presented the child with a randoa arrangeaent of the three relevant ac- 
■ tora (toy anla a la) tn artvanrft of Presentation of th e sentence. it la clear 
the picture test gives aore concurrent inforaation, and thus nlght^Jbe 
~ expev; led to stills ' worl*Tng~mciiidry "significantly less". This speculation is 
8U(;parted by the findings of ElBore-Nioholas a Brookshire (1981), in which 
p«*f<»*aance of aphasic adults on a sentence verification ta^ was facilitated 
by the presence of pictures. Thus, there may be no real inconsistency in the 
findings of the two studies that tested sentence ooBpnehension in these sub- 
jects. Conceivably, the present experiment failed to detect real differences 
between the reading groups because the method of testing did not give adequate 
scope for differential performance. 

In sunroary, the poor reader j of this investigation were less accurate 
than the good readers in immediate recall of sentences containing reflexive 
pronouns, but were not deficient in comprehension of the same sentences. They 
wer» deficient, however, both in recall and interpretation of another set of 
complex sentences, as reported by Mann et al. (in press). We suspect that the 
comprehension testing conducted with these children yielded inconsistent re- 
sults because the picture -verif icatlon procedure used to test reflexive 
pronouns was insufficiently sensitive. The performances of the poor readers 
did not closely reseirtjle 'those of the adult aphasics studied by Blumsteln et 
al. (1983). Unlike the aphasics, neither good nor poor readers displayed rig- 
id adherence to a miniimjm distance strategy fa* determining pronoun reference. 
Nevertheless, reading disabled children— the present group included—have not 
typically been found to be the equals of good readers in processing spoken 
sentences (Byrne, 1981 j Mann et al., in press), nor, as we have noted, in the 
use of short-term memory codes which so often are impaired in aphasia 
(CJoodglass, Denes, & Calderon, ^9^i^i Martin & Caramazza, 1982). It seems im- 
portant, therefore, to explore fully the relations between short-term memory 
deficits and sentence-processing deficits, and in this regard to seek a better 
understanding of the similarities and differences between developmental lan- 
guage disorders, such hs specific reading disability, and linguistic deficits 
in the acquired aphasias. 
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Pootnotaa 

"Nor oan wa axclude trw poaaibility that the atratagiaa thay ainJloy on 
earUin^ther cognitive tests nay be defiant (see Wolfo'pd & Fowler, 198'!). 

•The groups were thus not equivalent in the proportion of boys and girls. 
We do not regard this as a serious lisbalance, touever, since research has 
ahown that the patterna of deficits characteristic of children with reading 
dlaability do not vary with the sex of the child (Llbennan & Mann, 1981), 

"The iilgher error rate on Type E sentences than on lyptti C and D, which 
were aatched with these for length, requires ccnoent, E sentences w6're. de- 
signed as controls to test basic grasp of pronoun use, and tfrerefore few er-' 
rors were anticipated firom children in the age range of our subjects. The 
analysis revealed that ~tne principal error on~m5 -sentence type was to inter- 
pret a pronoun as though it were a reflexive. Thus, the sentence "The astro- 
naut poured hia a drink" ms interpreted to nean that the astronaut p<yed a 
drink for hiaself. We speculate, that this interpretation reflects a (fialect 
preference and not a genulne^^nfUslon in assigning pronoun reference. in 
support of this, we note that on Type C sentences, where reference is estab- 
lished by gender, such misinterpretations practically never occurred. 

Appendix 

Sentences, used In comprehension and repetition 

Set 

t 

I. A. The fireman watched the soldier bandage* himself . 

B. The fireman watching the soldier bandaged himself. 

C. The fireman bandaged her, 

D. The soldier bancaged himself. 

E. The soldier bandaged him. 

IX. A. The astronaut watched the sailor pour himself a drink, * 

B, The sailor watching the astronaut poured himself a drink, 

C. The sailor poured her a drink. 

D. The astronaut poured himself a drink. 

E, The astronaut poured him a drink. 

I II. A. The farmer watched the Indian pull himself n> the rope. 

B, The farmer watching the Indian pulled himself up the rope, 

C. The policeworoan- pulled him up the rope. 

D, The Indian pulled himself up the rope. 

E. The farmer pulled him up the rope. 

IV. A. The clown watched the boy spill paint on himself. 

B. The boy watching the clown spilled paint on himself. 

C. The girl «pllled paint on him, 

D. The clown spilled paint on himself. 

E. The boy filled paint on him. 

V.A, The girl watched the grandmother make herself a sandwich. 

B, The girl ictchlng the grandmother aiade herself a sandwich. 

C, The Indian made her a sandwich, • 

D, The grandmother made herself a sandwich. 

E, The girl made her a sandwich. 
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The 


nurse tatched the policewoman spray perfuna on herself. 




Tn# 


polioewosan watching the nurse sprayed perfune on herself. 


Ce 


The 


clown sprayed perfune on her. 


r\ 
V 0 


The 


nurse sprayed perfume on herself. 




ine 


nurse sprayed perrume on ner# 






waitress watcnea the Dallerina dress herself. 


a 

D • 


1 ne 


waitress vntcning the Dallerina dressed herself. 


n 


ine 


nurse dressed hlw. 


t\ 


The 


ballerina dressed herself. « 


c 

c • 


ine 


waitress dressed her* 


VIII. A» 


The 


witch i«itched the queen pick herself a flower. 


B; 


The 


queen watching the witch picked herself a flower. 




The 


queen picked him a flower. 


D. 


The 


witch picked herself a flower. 




The 


ijueen picked her a flower.' 



Control Sentenceo (repetition) 



1 . The sailor and the firenan poured coffee from the pot. 

2. The astronaut and the sailor bandaged the boy's hand. 

3- The boy and the Indian pulled the sled up the hill. 

1, The clown ai|d the farmer spilled paint on the sidewalk. 

5. The queen and the grandmother made sandwiches for lunch. 

6. Th« nurse and the policewoman sprayed water on the flowers. 

7. The witch and the ballerina dressed for the party. 

8. The waitress and the girl picked flowers in the park. 



196 



SPELLING PROFICIENCr AND SENSITIVITY TO WORD STRUCTURE* 

William Fischer, t Donald jShantcweFIer, tt and Isabelle.f^ Libermantt 



A b s tr act , The connection between spelling and pronunciation in many 
English wgrds is somewhat remote* To spell accurately, a writer may 
need to appreciate that the orthography maps regularities of more 
than one kind- Two experiments explored the possibility that young 
adults who differ in spelling ability also differ in sensitivity to 
merphophonemic structure and word forroational principles that under- 
lie the regularities of English spelling, Tn the first, an analysis 
of misspellings showed that poor spellers were less able than good 
spellers to exploit regularities at the surface phonetic level and 
l^re less able to access the underlying morphophonemic structure of 
words, A second experiment used pseudowords to extend these find- 
ings and to confirm that spelling competence involves apprehension 
of generalizations that can be applied to new instances. 

All would agree that English spelling is not easily mastered. Even ac*- 
complished readers and writers may at times be uncertain aDoul the spelling of 
particular words. There is letis agreement about why English causes so much 
difficulty • The reason most often given for spelling failures is the supposed 
irregularity of English orthography. This diagnosis^ though popularly accept-- 
ed, is a misleading oversimplification. It reflects the widespread confusion 
about how the orthography represents word structure. An example will serve to 
illustrate that when English spelling departs from one-to-*one correspondence 
with pronunciation^ as it so often does, it may nevertheless preserve orderli- 
ness at some other level. The plural s in cats r^^'Plves an s-sound while the 
s in dogs is pronounced as We do not balk at this inconsistency perhaps 
because the convenience of representing the plural morphophoneme in a consl^- 
ent way overrides considerations of strict one-to-one correspondence with 
pronunciation* 

It is characteristic of English that the degree of transparency of the 
mapping between word components and their orthographic representation vari es 
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kins Laboratories from the National Institute of Cruld Health and Huinan 
Development (Hl;-0199^). 
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considerably from word to word. This diversity is a consequence of the many 
and varied sources of the English vocabulary. There are. on the one hand, 
J?^*** ^^^^ !J|£E' ^^^^ a raorphophonemic structure, and hence a spelling, 

that is in close correspondence to a typical phonetic realization of the word. 
On the other hand, there are words in which the morphophonemic structure for 
one or more segments is at some remove from a phonetic realization of the 
word. This occurs frequently in words that are foreign borrowings (for exam- 
t^ourgeols) or in words reflecting archaic forms (for example, gnaw ). In 
contrasting these two extremes*>(iffnight characterize the mapping for the first 
set of words as being all but transparent, whereas the mapping of the second 
set is relatively opaque to many, perhaps roost, users of English. 

Many English words have a degreeof orthographic transparency that lies 
somewhere between the extremes represen^d by the examples' given above. Many 
words are more or less straightforward except that they contain a "problem 
segment." Examples Include such words as thinned , misspell , and grainnar . At 
one specific location In each of these the relationship between the 
raorphophonemic and phonetic structure is not im-uediately transparent in the 
spelling. In cas„-& such as these, correct spelling could be facilitated by 
apprehending th'' morphemic structure (mis + spell requires retaining both 
s's), the orthc,. -^ v hie , conventions ( thin ed requii >s doubling the n), or the 
derivational rc .cionships (the identity of the reduced vowel in grammar can 
be uncovered by relating the word to cognate forms in wffich the same" Towel 
segment is 'not reduced, in grTnunatical or grammarian). 

It is. one? thing, however, to demonstrate that order exist^B in the mapping 
of word and orthography-. It is quite another to show that the regiilarit'ies 
are apprehended and utilized by ordinary spellers who are not linguistic 
scholars. If we accept the premise that English orthography is by and large a 
rational system, it is reasonable to suppose that successful use of the 
orthography may be dependent on the users' ability to understand the system 
or on What w shall call their "iingui.oti- sensitivity." 

We uao the term "linguistic sensitivity" to refer to the ability to ap- 
prehenri the inherent regul.iri t i-c at v=.rious levels of linguistic repreaenta- 
tiop and the ability to exploit, this knowledge in reading and writing words. 
There exists already considerab le. ev idep'^e that successful readers can be dU* 
tlnguished from unsuccessf;u 1 ones on a number of metalinguistic abil/Ues 
(Fowler, 3hankweil ..-r, & Liberrna;,, IM/V;- LJbermm, Shankweiler, FischeV, \ 
Carter, 197^; Morals, 'Ury, Alcgria, Bertelson, 1978; Perfettfi' 4 

MeCutchen, in prt-ss; Vellij,tir)o, 197 V It is possible that major dif f^-rinc-^ 
in linguistic sensitivity dffiner^ m-.y also be ansoriated with the Ttt^-.- 
variation-, in spelling aMiity that are found even among n ighly-s. -lOO l-d 
adults. In the past, in v.st i^^.-i? or:^ have looked repeatedly to nonl ingu iraic 
explanations, appealing, for- ^'>cam^'ie, tu individual dif rereno--."; in visual 'm*-jv.- 
cry ability (Shaw, I9t-; Wi inerrjpoon, I'/Zi). The alternative view io th^.t 
spelling draws heavily upon knowledi-- of linguistic structure. Although this 
viewpoint is not new (see, in particular, Chomsky Halle, , the V^-o-nt 

spate of papers or; spelling ofrHr.-, little direct eni,.irica; pvidfru '' .^itn^r vm 
or con (but see Frith, IQ'r^; M-ircei, 196O; ^ad i^tcinLmg. 197 The lyrv- 

sertt study was designed tc fill what :;ee:7]cd an obVious r-'^-d. 

Before an empirical investigation could be starte 1, tPf.r mat-rials capa- 
ble of assessing sensitivity to tne structural properties of tne orthograpfy 
had to be developed. Although acme experimental spelling tests (e.g., Barron 
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1980) categorize wc^^ds as •^regular*' or Irr egu lar, the basis for classiflca*- 
tlon Is not usually «ade explicit. The classification of ••regular*' la typi-- 
oally afn>lied to words having a presuised straightforward corr^pondence be- 
tween spelling patterns and phonetic structure (e.g,, fresh ). Accordingly, 
words with regularities of all other kinds are typically designated as '•irreg^ 
ular** (e^g*, si^n ) , despite their demonstrable adherence to a pattern or rule. 
A further shortcoming of the available tests is that they s^e constructed 
without regard to variations in word frequency^ Together, these deficiencies 
make existing tests unsuitable for our purposes. Accordingly, an Experimental 
Spelling Test 'vas developed ^to overcome these limitations. While controlling 
for word frequency , It attempts to capture some of the structural properties 
that give rise to different levels of transparency in English spelling. 

The hypothesis under investigatioii is that educated adults who differ in 
spelling ability on conventional spelling tests differ correspondingly in the 
l^nowledge we call linguistic sensitivity. To explore this possibility, two 
experiments were conducted. In the first, the performance of good and poor 
spellers was examined using the Experimental Spelling Test^ It was anticipat- 
ed that for all subjects those words in which the morphophoneraic representa- 
tion is at some remove from the phonetic structure would be more often mis-* 
spelled, other things equal, than those words in which the two levels of 
representation more nearly coincide. Moreover, if good and poor spellers are 
primarily distinguistied on the basis of their metalinguistic abilities, then 
the largest differences between the groups on the Experimental Spelling Test 
ought to occur in spelling the words whose mapping can only be rationalized 
linguistically. Smaller differences, or no difference, should occur on the 
opaque words, for the spellings of which the subjects may have to rely chiefly 
on rote memory. 

' If college^level aaults who differ in spelling proficiency can be distin- 
guished on the basis of their sensitivity to certain structural characteris-- 
tics of real words, then differences among them should be especially evident 
on tasks that are free fron. the effects of word-specific learning. The second 
experiment of this investigation explored this possibility by comparing the 
performance of good and poor spellers on tasks that tap certain linguistic 
aoilitles presumed to be usaful in spelling the words on the Experimental 
Spelling Test. These abi:^itles include knowledge* of •abstract spelling pat- 
terns, familiarity with principles involving prefixation and suffixation, and 
ability to use tacit knowledge of English morphophonemics in order to 
disambiguate reduced vowels. New materials had to be developed for tappihs 
these abilities, Pseudowords rather than actual words were used where neces- 
sary to ascertain that the subjects had acquired general principles of ortho- 
graphic representation that can be applied to new instances. 

In addition to the assessment of metalinguistic abilities associated with 
spelling performance, Experiment's also examined the possibility that good and 
poor spellers may differ in their use of visual retention strategies. Since 
visual memory is often cited as a major determinant of spelling proficiency 
(Shaw, 1965; Sloboda, 1980; Tenney, 1980; Witherspoon, 1973), a task 
assessing visual memory ability for abstract designs was Included. I t was 
anticipated that on the linguistic tasks, good spellers would continue to 
outperform those who were less proficient^ while no difference between the 
groups would emerge on the task of visual memory for designs. Flpally, the 
groups of good and poor spellers were compared on tasks designed to tap broad^ 
w aspects of literacy, namely^ reading skills and vocabulary knowledge. 
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Experiment 1 

The purpose of this experiment was to compare the performance of col- 
lege-educated adults who differ in spelling proficiency on spelling tasks that 
incorporate graded changes in orthographic transparency. 

Method 



' Subjects . Two groups of subjects, good spellers (N=18) and poor spellers 
iN-20;, were selected from. a larger sample of 88 undergraduate psychology stu- 
dents who responded to a notice inviting them to participate in an investiga- 
tion of spelling nihility. The notice had tpcouraged people to sign up regard- 
less or their level of spelling proficiency. The 88 4nitial participants were 
ail native speakers of American English, 21 males and 67 females, ranging in 
age froB 18 to 37 years (mean age.20 years). While they do not constitute a 
random sample, those participating did represent a broad range of spelllnR 
proficlencj- as indicated ,by their scores on the spelling section of the Wide 
Range Achievement Test ( Jastak, - Bijou, i Jastak, 1965). Grade equivalent 
scores on the WRAT ranged from 8.^ to 15.7 wrth a mean of 12.3. 

Those identified a^ g^od spellers for the purpose of this study performed 
c n cf?^''^ ^''^^^ '^^""^^ on -the WRAT (mean grade equivalent was 
^.U.-0.51), Those categorized as poor spellers were clearly deficient per- 
forming on the average four years below grade level (mean grade equivalent was 
10.2, S.D.-0.6il). The good speller . group Included 6 males and 12 females,' the 
poor spellers consisted of n males and 16 females. 

Stiimll . The chief Instrument used was the new three-part Experimental 
Spelling Test of 120 words. The words were grouped into three levels, Ho In 
each, differing in the transparency of orthographic representation. For Level 
1 words, the phonetic, realization is, for any given speaker, reasonably close 
to the orthographic representation, and the spelling patterns are, for the 
BK>st part, restricted to those having a high frequency, of occurrence in writ- 
ten English. Examples of words so classified are harp, adverb , and retort . 

Level 2 words each contain an ambiguous segment involving some departure 
from straightforward phonetic ma iping. They are further partitioned into two 
subtypes. Level 2A words require either a rote application of established 
orthographic conventions, or a sensitivity to regularities at the surface 
phonetic level. For example, a speller may know that the /n/ segment Is 
r-presented by nn in thinned but by n in chained . ,The experienced writer does 
this quite mechanically, having learned that in monosyllabic words the final 
consonant letter is doubled when preceded by a single vowel but not doubled 
when preceded by a vowel digraph.' Indeed, in many instances the graphemlc 
conventions relate to phonetic facts such as those Involving lax versus tense 
vowels. In contrast. Level 2B words draw upon abstract morphophonemlc knowl- 
edge to derive the spelling. patterns for the ambiguous segments. For example 
in order to know that the final consonant letter in confer is doubled In 
SS*}.f^rZ^M but not in conference, a speller must .apprehend 

linguistic regularities relating to stri^ placement*, and how these govern 
spelling. The generalizations Included ir>, the list are des-ribed in Appendix 

1 • 
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Level 3 words can be derived ©nly partially by using oorphophonemic 
knotfladge, since tney contain one or acre segments that do not generally occur 
In English or occur ^with low frequency. Their relative lack of transparency 
steas froBJ two factors: the words are related to borrowed forms largely 
Obscure to the nonscholar and the nonpolyglot, and their spelling patterns 
have a much, lower frequency of occurrence in English than do the patterns 
appearing in Level 1 and 2 words. Exaaples Include such words as gnaw, 
bourgeois , and Fahrenheit . 

The three levels were balanced insofar as possible for syllable length 
(each level approximating a mean of 2.8 syllables) and frequency of occurrence 
In written English (each level approximating a mean of 6.1 occurrences per 
1,01^,232 words of natural language text), according to the Kucera and Francis 
(1967) statistics. Within Level 2 the 2A words had a mean frequency of 
occurrence of 5.7 versus 6.8 for the 2B words. The 2A words had a mean of 2.H 
syllables versus S.** for the 2B words. The 120 words (which are listed in Ap- 
pendix 2) were randomized, and recorded on magnetic tape at^W s intervals. 

Procedure §i 
/• " 

The subjects were tested in small groups. The testing session lasted for 
one hour during which the following tasks were administered. 

^* Spelling Production Task. The subjects' task was to print each 
dictated word in the space provided and to attempt every word. Each was re-' 
peatcd once. . '| 

2. Spelling Recognition Task . The same Items were presented again, this 
time as a multiple-choice recognition test. The answer sheet offered three 
alternative spellings for each dictated word and, additionally, a "none of 
these" option. Each of the three alternatives was phonetically readable as 
the stimulus word; thus no foil could be eliminated merely on the basis of a 
gross disparity between the spelling of an item and its phonetic realization. 
Common misspellings of the stimulus words appeared as fails. 

. 3. Spelling Subtest of the Wide Range Achievemgnt Test . (Jastak et al., 
1965). The words from the Level 2 spelling list of the WRAT were recorded on 
jnagnetic tape at 10 s Intervals. The srfBjects' task was to print the words in 
the space provided. 

Scoring of Spelling Errors 

The following error categories were used to analyze the misspellings: 

1 • «||«ford Errors were scored for each misspelled word without regard to 
the numDer of misspelled segments (for example, when g rairanar was spelled 
**grammer*' or sergeant as "sargent." 

2. Segment Errors were scored for every Incorrect spelling pattern, as 

defined by guidelines established by Hanna, Hanna, Hodges, and Rudorf (1966). 

Segment errors were further classified as substitutions, omissions, or inser- 
tions. 
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a. Substitution Errors were scored wiMn an incorrect grj^hem was 
U8«d in pl«c« of thm oorr«ot letters. These were further cUsaif ied as 
"phonetic substitutions*' »rtien the word as spelled captures the word's 
approxiiaate phonetic shape (as when rhododendron was spelled 
"rododmdron" or when gnaw was given as "naw") "nonphonetio sidOstltu- 
tions" (for exafl^le, when adverb was spelled "advert").* 

b. Omission Errors were scored when a graphene needed for the ortho- 
graphic representation of a phonological segment was omitted (for exair- 
ple, inflate for "infate"). 

c. Insertion Errors were scored when an additional grapheme was 
Included (for exanple, retort for "restort"). 

Results and Diacussion 

A preliminary step w^s to establish that the Experimental Spelling Test 
designed provided a reliable and valid estimate of general spelling ability. 
A test-retest comparison of word errors carried out on a subset (N-30) of the 
88 participants resulted in a reliability coefficient of .97 (p < .001) on the 
Spelling Production Task. The results of a correlational analysis revealed 
that word error scores on the Spelling Production Task correlated significant- 
ly " E < .001) with error scores on a standardized test of spelling 
achievement, the V'de Range Achievement Test. Together, these results suggest 
that the test yields a reliable measure of spelling achievement and gives re- 
sults that are highly conqparable to a widely-used conventional test of spel- 
ling proficiency. 

An analysis of item difficulty on the Spelling Production Task was also 
conducted to examine for possible floor or ceiling effects. It was fouhd that 
no word was misspelled by every subject, and even the most difficult words on 
the list ( desiccate and sarsaparllla ) were spelled correctly by at least two 
of the 88 subjects. Although 20 of the 120 words were never misspelled, no 
subjeet obtained. a perfected score. The number of misspelled words ranged from 
18 to 52 with a mean of 33.9 (S.D, - 8.9). 

Spelling Production Task ; The locus of spelling difficulty . It is im- 
portant to discover whether the spelling mistakes made by poor spellers are 
limited to words having particular orthographic or structural characteristics 
or whether the difficulties reveal more general deficiencies in transcribing 
English. To answer this, we first looked at the distribution op-wisspelled 
words on the Spelling Production Task across the three levePB%f (orthographic 
transparency (see Figure 1). The data were analyzed by a twoS<»y analysis of 
variance in which the fcclween-groups factor was spelling group, the wlth- 
in-groups factor was orthographic level and the dependent variable was the 
number of word errors. As can be seen in Figure 1, the good and poor spellers 
differed sharply across each of the three orthographic levels: F(1,36) - 
15«.73, £ < .001, MSe - 7.95, for group; F(2,72) - Tn.^lTp < .001 , 
MSe - ^.57, for level. The Interaction between group and orthographic level 
was also significant, F(2,72) - H2.2\, £ < .001. KSe - i*.57. Good spellers 
made significantly fewer errors at each level than did poor spellers (^ Level 
1. t(36) - ^'^0 ji < *00^i at Level 2, t(36) - I2.e>f, £ < .001; and at Level 
3, t(36) - 7.35, < .001). It is of interest to note that the interaction 
remains significant when the group by level analysis is recomputed for Levels 
2 and 3 *Xcne. P(t ,36)-13.^3, £ < .001, MSe - 27*1«. This su«|^ftts that the 
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full interaction trf^t is not sisiply a conaequeoce of the greater accuracy of 
iMth croi^ji in a pemng the ortiiotraphioaUy trai^arent Level 1 wrde* but 
instead reflects ^erfonunce differences all across the range of orthographic 
trwisparency. 

? 1 
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Figure 1 . Comparison of word errors on spelling production task as a function 
of orthographic transparency, good versus poor spellers. 



The finding that good and poor spellers differ significantly in their 
ability to sp«ll words at each of the three levels suggests that they have 
general deficiencies in spelling rather than isolated, local difficulties 
restricted to particular exceptional words. 

As expected, few Level 1 words were misspelled by either group. 
Nevertheless, even on these the two groups differed significantly. Errors 
nade by poor spellers were quite varied. In 11 percent of the cases the 
dictated ite« was apparently wlsperceired perhaps because of unf anil liar Ity 
with the word — for exaople, vortex rendered as "thortex" or "vortext." In 29 
percent errors occurred in relation to the representation of free versus 
checked voweU—for example, diplomat rendered as "dlpiomate", emit as 
♦•eialte," However, the bulk of the errors (60 percent) were Instances of the 
W9e of spelling patterns that in another context would be appropriate but are 
Incorrect for the particular morpheoe^ being represented, for example— spelling 
retort as "rhetort," and punlshaent as "punnishoent." In contrast to the 
grater range of difficalty exparlenoed ^ pdor apell^nif the Level 1 errors 
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ofg90d »p9ll«r», with tlM exception of t?ie ¥Cr6 can ia ten (which oany spelled 
^o*aol»t«'») , were confined to oooaaioaal siapffroei^Uoi^ of a atlntlua word 
(aiwutiiy nmmm aa "fmMP or oamm9iu$ as ''oCMpenaate*^). ^ , 

Differencea in the ability of good and'^pdar^ellera to tranacribe words 
aj2J*«tfX«cted in quantitative differences in vlr^tOSlly every 'aapect of per- 
f<^Mnee cm which the Cwo groups were coopared. Table 1 presents an overview 
of the analysis of segaent errors^ As anticipated, wost errors occurred on 
those phonologic segaents that departed most conspicuously irron a straightfor^ 
ward phonetic transcription. As Table 1 reveals for both groups substitution 
errors aooMmted for the bulk of the errors oade, followed by a auoh saaller 
percentage ofoaisslons and even fewer insertions. Overall, the poor sp^lers 
MdeMignificantly «ore wrorn of each type (for substitutions, t(36) - 8.98, 
£ < .OOlj for oaissions, t(36) - 3-65, £ < .001; and for insertions. t(36) - 

£ < ,02). The low percentage of omissions and insertions indicates that 
both groups were generally accurate in preserving the segnenUl structure of 
words. 



Table 1 

• t * * f 

■ % 

Suamary of Segment Errors on Spelling Production Test 

Good and Poor Spellers 



Good Spellers 



Poor Spellers 



Error 
Type 


Mean 


Percent 
Substi- 
tutions 


Percent 
Total 
Error 


Mean 


Percent 
Substi- 
tutions 


Percent 
Total 
Error 


Substitutions 


31 .9 




85.8 


63.2 




83.9 


Phor^tlc 
Norqpnonetlc 


27.9 

4.0 


87.5 
12.5 


75.0 
10.8 


56.2 . 
7.2 


88.9 
11.4 


74.6 
9.6 


Consonants 
Vowels 


12.5 
.19.3 


39.2 
60.5 


33.6 
51.9 


22.8 
40.6 


36.1 
64.2 4 


30.3 
53.9 


Omissions 






11.8 


to. 2 


• 


13.5 


Insertions 


0.9 




2.4 


1.9 




2.5 


Total Errors 


37.2 






75.3 







Since errors of substitution were mo9% numerous, t^e analysis focu^d on 
these". It was found f-hat for both groups sighif loantly ir/>re substitutions oo- 
curred on vowels tliao^ consonants with the poor spellers again asking sig-^ 
nlficantly acre errors than good spellers on both consonants (t(36) « 8.03, £ 
< .001) and vowels {t(36) - 10.39, £ < .001). The greater difficulty in spel- 
ling vowel s««anta is expeoUd since the Mapping between orthographic pat- 
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ttrmi aiwl jtoimI aouiMis !• g«R«r«jily worm wliU>le than It is for oonaonants. 
fiaftUy^ftor teU utaviftle MAtHttUon* signifiomtly outntii^sred 



nonphOQ«tlo substitutions with tiw poor spellers sgain aaking significantly 
■ore of sa^ error type than the good spellers (fdr phonetic substitutions, 
• ia.«8, £ < .001 f for nonphonftlc substitutions, t(36) • 3.15, p ! < 
^1). These data suggest that highly-schooled aduJts usually represent the 
phonetic characteristics of wrds adequately but sometiises fail to attend to 
the deeper ■orphophonewic regularities that would have led^o the correct 
spelling. 

» 

. p:q<>MCtlon errors versus recognition errors in spelling . In examining 
the effect of orthographic transparency on spelling accuracy it is of interest 
to ofa^>are the performance of the two groups on the task utilizing a recogni- 
tion forast. Figure 2 presents these daU for the good spellers (top) and 
poor spellers (botton). The data were analyzed using a three-way analysis of 
variance in which the between-groups factor is spelling group and the wi th- 
in-croups factors are condition (production and recognition) and level of 
orthographic transparency (Level 1, 2, and 3). The dependent variable was the 
nudA>er: of fflisspel led words. 7 

As expected, the task of recognizing correctly spelled words pj'-oved* to be 
significantly easier for the two groups coabined than the task requiring 
spelling. production (for condition, F{1 ,36) - 92.32, £ < .001, MSe - 2.68). 
The Mean word error score under the recognition format was 27.8 compared with 
a higher mean error score of 3^.Q on the production task. Mo differences were 
found for the Interactions of group by condition, F(1,36) • 2.54, p < ,12» Mae 
- 2.66, or group by condition by level, F(2,72) - 2.56r p < .08, Mse - 2.16. 
Of particular interest, however, is the finding that" for both groups the 
overall increase in accuracy that occurred under the recognition condition is 
largely concentrated on the morphophoneaically opaque. Level 3 worde (for 
condition by level, F(2,72) - 97.85, £ < .001, MSe - 2.16). Whereas subjects 
typically reduced their word error score on Level 3 words, smaller reductions 
in errors occurred in spelling the more transparent words. The mean word 
error score on Level 1 words was 2,0 on the production task versus 1.5 on the 
recognition task ahd on Level 2 wcxMs, 11. i» mean word errors versus 11,5 mean 
word errors, respectively. 

f f ^r^'y^,^^ k£^!g£n ^ poor spel l ers iii linguistldr sensitivity . 
While these findings underscore th> quant it.ative differences between good and 
poor spellers, the critical abilities distinguishing the two groups remain un- 
defined. From a linguistic perspective there are certain skills that still 
need to be explored. On the, one hand, for cxaiaple, poor spellers might be 
differentia ted from good spellers in their lack of sensitivity to surface 
orthographic and phonetic regularities that Signal the use of particular spel- 
ling patterns. Alternatively, or additifnally. they might differ in their 
abUity to penetraU below the surface structure to the deeper morphophonemic 
regularities that determine the appropriate spelling patterns. 

In order to evaluate these possibilities, it was useful to examine the 
performance of good and poor j9>ellers 01^ the Uvel 2 words where the perform- 
ance differences beVgeen the groups were larg^t* It will be recalled that • 
each Level 2 word contained an an^iguous segment. In approximately half of 
the words (Level 2A), the spelling of that segment could be ascertained by 
reoQgnUing certain orthographic regular itiea and. by laDleaBnting the relevant 
ortlMprtphlc conv«i^ioiie. m the remiiting half (28), tlve ambiguous segment 
could be derived only by accessing the morphophonerolc information. 
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2. C<wparl»on of word errors on production and recognition tasks 9a a 
function of orthogrtiphlc transparency, good versus poor spellers. 




FiaolMT •t Spoiling Ability &nd Linguistic Senaitlvlty 

In cr49r to det«paln« lAether the good and poor spellers differed in 
%»lr ability to 4pell ttim^ tm subolMses, it wio necessary to aeoerUin 
mttMr ti^ mrtm that oeourred did indeed ihTolVe the seinent designated as 
the ariOiguous sepwnt (the "problea\ segnent**). An exaaination of the errors 
revealed that» in both gi^oupa, $3 percent occurred on' prob lea segaents involv- 
ing either orthographic or aorphophonealo debislond, while the reaaining 17 
percent occurred on other segaents within these words. The analysis was 
therefore restricted to those errors that occurred' at the oritical location. 
In addition, because two spellings were found to be acceptable for one of the 
Level 2A words ( oanceHed and canceled , Webster, 1963), it was excluded froai 
the analysis, reducing the total nuirt)er of words to 19. ■ 

In Figure I the aean percenUge of word errors is presented for Level 2 A 
■ (orthographic) and Level 2B (aorphophonealo) words. The daU displayed in 
Figure 3 were analyzed by a twq-way analysis of variance in which the be- 
tween --groups factor was aHoelling groi4> and the within-'grtMips fact(»* was error 
type (orthographic or aorphophonemic). The dependent variable was the 
percentage of word errors based on 19 words in Level 2A and 20 words in Level 
2B. 



O 'Z, 

O o 
0) 9- 

« O 

c c 

<D O 



40 

36- 
32 
.28 
24 
20 

16 

12h 
8- 
4- 



MM* 



••••• 



••••• 



^•GOOO 



ORTNOORAPHiC 



MORPHOPHONCMIC 



Figure 3. Cowparison of or.thographic and raOT-phophonemic errors on level 2 
words, good versus poor spellers. 
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FUiire 3 aho«s « wl<te sepAratlon in the perforwnoe of the good aiul poor 
^ll«r». Of pmptiottlar luttroat, howmr, i» th« unoaunl parforwuioo of tho 
two groups on the tw oategories of words, yielding a significant Interaction 
between group and error type, F(1,36) - 10,29. £ < .003, - 5^.0H, As 
noiUd be expected, good spellers ude fe»#er errors than poor 'spellers both in 
applying orthographic conventions, fisher's poet hoc t(36) - 7.00. g < ,001, 
and in spelling words inrolvlng access to oorphophoneaic structure, t(36) - 
9.5*1, £ < ,001, The aore notable result, however, is that good spellers found 
words involving aorphophonemic decisions significantly easier, than words 
InTOlving purely orthographic decisions. t(17) - 2.73, £ < .02, while the poor 
spellers showed no significant difference in their ability to spell the two 
types of words, t(i9) * 1.98, £ > ,05, This suggests that good and poor 
spellers oay differ in their ability to penetrate below the surface phonetic 
structure to the underlying ■orphophoneaic structure of words. To ascertain 
whether this finding could be generalized to other words not included In the 
present list, a second ANOVA was ooaputed using the 39 Level 2 words as the 
randoa variable (Clark. 1973); the bet ween~g roups factor was word type 
(orthographic ve; oorphophoneaic) and the wl thin-groups factor was group (good 
vs. poor spellers) , The dependent variable was the percentage of errors made 
by good and poor spellers on each of the words. The analysis indicated slg-r 
nlficant effects of word type, F(t,37) - ,02, £> .05; group, 
F(1,37) - 60,30, £< .001. MSe - 162. glT word by group, F(1 .37) - 6 , 32. 
£ < ,001, As a further step, the min F* was' confuted. The outcame suggests 
that the differences observed between the groups in spelling the Level 2A and 
2B words extend beyond the particular words used In this experiment, oln 
F»C1,5^) - 5.0, £ < ,03. 

The contribution of nonllnguistlc abilities to spelliry proficiency . So 
far the findings have suggested that differences in spelling achievement are 
at least in part associated with differences In apprehension of word struc- 
ture. It is also of Interest to examine the results as *they relate to a 
long-held belief that individual differences. In spelling proficiency may re- 
flect differences in visual retentlveness. Two aspects of the data are perti- 
nent to this question. If visual memory skill were the critical distinguish- 
ing factor, then the greatest performance difference between the groups should 
occur in spelling the opaque. Level 3 words, since these presumably have to be 
learned and i;^alled by rote. However, on re*-exaininlng Figure 1, one finds 
that although goo«l and poor spellers did in fact differ in their ability to 
spell Level 3 words, the aagnltude of the difference Is smaller than that 
which occurred in spelling the derivable. Level 2 words. These results sug- 
gest that if there are differences between the groi^js In their ability to re- 
call visual images of word patterns, these differences are of lesser impor- 
tance than those relating to the understanding of how the orthography maps 
word striKJture,' ' ^ 

. . ■ 

Moreover, if visual memory ability ♦ere an especially critical skill in 
spelling, good and poor spellers should differ in their ability to recognize 
correct spellings when given alternatives rroo which to choose. Reexafflinatlon 
of Figure 2 suggests that the two groins are not readily distinguishable In 
this regard. This is confinaed by the finding that the relevant interaction 
effects were not significant (for group by condition, F(t,36) - 2.5^*. £ > .06. 
MSe - 2,68 or for group by condition by level. F(2.T2) - 2.56, £ >^ .05, 
Use - 2,16). Thus, on the spelling recognition task good spellers were not 
significantly better able than poor spellers to profit from visually -presented 
alternatives. Hhlle It Is quite likely that visual mmory plays soae role in 
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spelling (sspsoially for Level 3 type tfofds), these cos{>arisons have uncovered 
m evideiKM tha^dirfepences in the ability to aeeeas words as visual patterns 
oan aooount for the sharp differences in spelling performance observed in this 
'Study. 

i 

Instead, the results of the spelling test suggested that linguistic fac- 
;tors play an important role in spell itig.. For both good and poor spellers the 
/accuracy with which words were spelled was clearly Influenced by the varla- 
/ tions in orthographic transparency represented by the three levels of words. 
Spelling was most accurate in oases where the underlying morphophonemic struc- 
ture was straightforwardly reflected in the phonetic realization of the word 
and l>eoaae progressively more difficult as the relationship between the under- 
lying flwrphophonemic structure and the written representation became increas- 
ingly otoscured by intervening phonologic and orthographic rules. 

f 

Further evidence that linguistic abilities are critical in differentiat- 
ing good and poor spellers cane froa the finding that the two groups were most 
readily distinguished by their performance on Level 2 words. If rote memory 
were the critical skill In spelling, Level 3 words should have most sharply 
distinguished the groups. Indeed, further analysis of the Level 2 errors 
revealed that poor spellers were less proficient In accessing the underlying 
I morphophonemic structure when it was not clearly reflected In the phonetic re- 
alization of the Hord. Thi3 finding underscores what may be an Important 
difference between the two groups: while good spellers fouri the spelling of 
words involving access to morphophonemic structure significantly easier than 
words Involving the implementation of orthographic conventions, poor spellers 
did not. 



Experiment 2 



The primary purpose of Experiment 2 was to discover whether the abilities 
that underlie spelling competence are instances of specific learning or wheth- 
er they are generalizations that can be applied productively to other English 
words. Specifically, the question addressed was whether college students who 
differ in their ability to spell familiar words would also differ in their 
ability to spell pseudowords that conform to the phonotactic constraints of 
English. The specific spelling sklfls under investigation included knowledge 
of the recurrent spelling patterns of English orthography, familiarity with 
the morphological principles guiding the use of prefixes and suffixes and 
ability to use morphophonemic information to disambiguate reduced vowels. The 
relevance of these skills to other aspects of written language, namely word 
recognition and reading comprehension, was also examined. A secondary purpose 
of the experiment was to explore the possibility that good and poor spellers 
differ in their ability to learn and subsequently to recognize nonllnguls^lc, 
nonrepresentational visual patterns. 

Wethod 

Subjects . The intent was to Include the 15 best and the 15 poorest 
spellers from Experif|fnt 1, but because some of the original subjects were 
unavailable for Experiment 2, eleven additional subjects were recruited from 
the original subject pool. The 15 spellers constituting the good speller 
group all scored more than one standard deviation above the mean on the 
earlier d€scribe<ll Spelling Teat of Experiment 1 (mean error scorf - 23.7) j 
the 15 poor spellers scored at least one standard deviation below the mean 
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(aean - lU). The aean WRAT spellinf grade equivalent was 13.9 for the good 
spttllers and 10.$ for the poor spellers. Eight of the good spellers and 11 of 
the poor spellers had participated in experitent 1. 

Stiauli and procedure . The following tasks, designed to evaluate specif" 
ic metalinguistic and nonlinguistic abilities rMating to spelling, were ad* 
sinistered. The 30 subjects were tested in small groups in tWo one'hour ses- 
sions. A '^"^v. 



^' ICnowledge of Abstract Spelling Patterns . This tas* assessed the suib- 
jects* knowledge of the 1 7^ principal spelling patterns identified by Hanna et 
al. (1966). The patterns included 93 consonant patterns and 81 spellings for 
vowels, 

A list of 3^8 English-like spoken pseudowords was prepared and recorded 
on magnetic tape. l|^ncluded two itens for each of Hanna*s 17^ spelling pat- 
terns. Pseudowords ^mt adhere to the phonotacttc constraints of English were 
used instaid of actual words in order to promote adoption of an analytic mode 
of processing; that is, to discourage the subjects from responding to items 
holistically as they night well do in the case of overlearned, familiar words, 

Eafch dictated pseudoword S^as printed on a prepared sheet. In each^ sin- 
gle spelling pattern was underlined. In half of the Items the underlined por- 
tion constituted an acceptable spelling for the corresponding phoneme and In 
half an impossible spelling. In each case, the nonunderllned portion was 
spelled In a manner consistent with English orthographic practice. All 3^ 
items appeared as orthographica.lly acceptable letter sequences regardless of 
whether the underlined portion was appropriately spelled; that Is, tljere were 
no letter sequences that do not occur in English. In those itWmSwI^e the 
underlined spelling was not a legitimate representation of the >e*brrespondlng 
phoneme, the presented spellings were confined to the appropriate class of 
phoneme (consonant or vowel) but never included spelling patterns that could, 
in any English context, legitimately represent the targeted phoneme. 

The tape-recoded stimuli were presented at Intervals of six seconds. 
^Subjects were asked to circle •♦yes** If the underlined portion of the stimulus 
word was Judged to be an acceptable spelling of the target segment or to cir- 
cle "no" if It was not. Three sample items were administered as a pretest, 

^' Princi ples of Pref Ixatlon assessed knowledge of how the orthography 
attaches the pref lie to the base word. A list of 60 items was prepareo for au- 
ditory pne^^tation consisting of three types of words; morKxnorphemlc words 
(for example^ constable ) ; words with assimilated prefixes, such as those 
formed by the addition of the prefix /ad/ to base words beginning with c, f, 
&* 1* £» 2 i ^^^^ example, accrue , affluence and aggravate ) , or those 
formed by addition of /coiV.^to base -words beginning wl^ either m, 1 or n (for 
example, coawittee , collateral and connubial ) ; and iVds with*" pref Ixes not 
involving consonant assimilation such as those formed by the addition of the 
prefixes mis, dls, contra and un (for example, misshapen , dissimilar and 
contradiction ). 

In order to forestall the possibility that a subject could mechanically 
partition the initial letters of the word as the basis for dividing the prefix 
fron the stem, without examining the whole word, an effort was made to Include 
"liorda in the list that began with the saoe phonetic sequence even though dlf- 
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r«r«nt principles of pr«flxation are Involved (e*g*« constable , connubial « 
■coacurrent ). 

The tape*^recorcled words were presented at lO-s Intervals. SiA>Jects were 
asked to print eaoh dictated word and to separate the prefix froa the base by 
a dash. They Mre cautioned that sone of the words would not involve a pre- 
fix. In which case they were to write a dash first, followed by the spelling 
of the word. Three exaoples, with and without prefixes, were given. Iteos 
were scored correct if the letter iswedlately preceding and succeeding the 
dash was accurate. 

3* Dlsaafciguatin^ Reduced Vowels . This task tested ability to access 
and utilize phonological inforaation in representing reduced vowels. The test 
list was oade up of 50 English-like words all of which ended in the unstressed 
syllables, /»/ ble or /9/nts, In some cases the target pseudoword was dictated 
alone, while in other cases it was preceded by one or More p8eudowc»*ds ph^mo- 
logically related to the target. In either case, relevant phonological cues 
were available to assist the speller in disambiguating the reduced vowel ih 
the targeted word. For some of ^e items the cue was in the relationship of 
the spoken pseudoword to its "derivative form, " The basis of the derivations 
is, of course, by analogy to actual words of similar structure. For example, 
given the strings CekstrApt, ekstrAp/an, ekstrAptabal], the relationship of 
CckstrApt^bal] to [ekstrApt] and [ckstrAp/an] signals the use of the vowel I 
to ortho^raphically represent the reduced vowel in the penultimate syllable of 
extruptible as in the case of the words corrupt » corruption , corn4)tible . In 
other cases, the phonemic context supplied by the pseudow<»*d itself provided 
the necessary cue for choosing the correct spelling pattern to represent the 
reduced vowel. For example, the orthographic representation for the reduced 
vowel in the ^penultimate syllable of Ckantramlsabal] is most likely to be 1^ 
since the pseudoword was formed in analogous fashion from a stem originally 
occurring in Latin adjectives ending in Ibllls and later borrowed by English. 

Spellings corresponding to each of the tape-recorded target pseudowords 
were listed, but with omission of the reduced vowel in either the final or the 
penultimate syllable. The omitted vowel was marked by a blank space In the 
appropriate location. Beside each pseudoword, two vowel spellings were 
presented as choices, a and 1^ for pseudowords mdlng In /e/bie and a and o for 
items ending in /a/nts. The subject's Job was to choose the correct spelling 
for the reduced vowel. 

^' P^lnc Ip les of Suffix at Io n, To assess mastery of the principles for 
appending suffixes, a list of 24 pseudowords was prepared for taped presenta- 
tion along with directions for changing each word into a new word bys^adding a 
given suffix. Thirteen English orthographic "rules" were Incorporated (for a 
listing of the rules see Witherspoon, 1973, p. 282-285). 

/ 'The HtMms were dictated at 10-|^ intervals in a standard carrier phras>, 
which instructed the subjects to change each stimulus item to a related foribv 
by attaching a specified suffix (for example, "Change prln to prlnnlsh'*). The 
answer sheet presented a spelled out version of each pseudowcwd with space 
alongside to write the word with the appended suffix. 

In additiort^to the foregoing tasks that were specially prepared for this 
study several standard teats were also administered. 
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^' Wechsle r Adult Intelligence Scale (WAIS) Vocabulary Subtest 
<Wech3ler, 1959). 

Subjects were given answer booklets In which items were printed with a 
space provided for the subject to wrjlte the definition of each stirouliijs word. 
Before beginning tne task, the examiner read each of the stimulus woF'is aloud. 

f 

^* ^^^"^ Rgadinj Recognition . Oral reading level was assessed using the 
reading section of th6 Wide Range Achievement Test (Jastak et al., 1965). 
This requires subjects to read aloud a series of progressively more difficult 
words within a prescribed time limit. The test was administerecL individually 
to each subject according to the standard procedure. I 

' Scholastic Aptitude Test Verbal Ability ; (Educatio^l Testing 
Service). SAT scores, required for admission to the university , /-tier e availa- 
ble with the subjects' permission. 

®* ^i'f^^ra Recurring Figures Test (Kiraura, 196^). A test of memory for 
abstract designs that (Jo not lend tnemselves readily to verbal labeling was 
used to assess visual memory ability. The test was chosen to provide a meas- 
ure of visual memory, uncontaminated by verbal cues. 

The test was administered in the standard manner. Subjects first viewed 
a set of lO cards on each of which was displayed a single design^ They then 
were shown 7 additional sets of 10 cards each. In each of the latter sets, 
four of the designs f^prtn the original set recur, randomly interspersed with 
six .non-recurr>^g designs. The task was to identify the recurring figures in 
each of the seven sets of cards by circling ''yes*' or "no" on the accompanying 
answer sheet. 

Results and Discussion 



Performance on linguistic tasks that pertain to spelling . As can be seen 
in Table 2 the general error pattern for the two subject groups yas remarkably 
similar. In both groups errors on vowel patterns accounted forVapproximateJy 
68 percent of the total error score while coi^sonant errors accaunted for th^ 
remaining 32 percent. But overall, the poor spellers made significantly more 
errors than did the good spellers in recognizing acceptable spelling p.^tterns 
for English morphophonemes , t(28) = 5.35, p < .001. The greater difficulty 
experienced by poor spellers occurred both'^in identifying consonant patterns', 
t(28) « 3.21, p < .01, and vowel patterns, t(28) - 5.?3, p < .001. 

In segmenting prefixes from base morphemes, poor spell t-rs again 
demonstrated significantly more difficulty than did good spellera, t(28) = 
3.81, £ < .001. There/was no difference between good and. poor spell '^rs in 
segmenting nonassimilated prefixes from'thcir base morphemes, t(28) -- 1.i^7, p 
> .05, but a significant d.ifference emerged in. segmenting prefixes Involving 
consonant assimilation, t(28) 3'^^, £ < .01.' The nature of the dlff/culty 
encountered by both groups was the same. Errors resulted from a failAe to 
use the double consonant pattern at the juncture of the prefix and the\ase 
morpheme (for example, representing con-nubial as " co-nub i a l** ) . 



It Is of interest to note that although good and poor spellers did not 
differ significantly in recognizing the monomorphemic words, t(28) = 1.67, p > 
,05, both groups found this aspect of the task difficult. Attempts t£> segtnent 
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Table 2 

Su*Mn*7 Scores" for Good and Poor Spellers on Linguistic and Nonllnguistic 



Good Spellers - Poor/ Spellers 

Mean Standard Percent Mean Standard Percent 

Task Error Deviation Total Error Deviation Total 

1. Abstract Spelling Patterns Test 



Consonant 

Errors 2,9 32 8J 2.9 32 

Vowl 

Errors 10.0 2.5 68 17.1 H .6 68 

Total 

Errors 1^4,7 1^.3— — 25.1 6.1 

2. Prefixation Test 
Nonassimllated / 

Prefixes 2.2 1.9 15.5 3.1 1.3 1^.^ 

Assimilated 

Prefixes ^ A 2.5 28.9 7.9 3.^ 3^.6 

No Prefixes 7.9 ^4.8 55.6 10.5 i4.2 i49.1 

3. Suff Ixation Te6t ' 

T^tal Errors 4.^ 1.6 9.8 3.2 

4. Reduced Vowel Test 

Total Errors 9.7 3.0 18.8 3.8 

5. Kimura Figures 

Total Errors 8.8 ^4.7 9.3 5.2 



f 
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words not having prefixes (for exaople writing constable as *'cm~9table") 
accounted for aj^roKlaately 90 percent of ths total error score. 

On the reoalning linguistic tasks good spellers continued to outperforn 
poor i^ellers. On the test of suffixation, poor spellers aade significantly 
•ore incorrect responses than the good spellers, U2B) - 6.08, £ < .001. Sim- 
ilarly, in representing the reduced vowel in various pseudowords, poor spell- 
ers made significantly acre errors, t(28) - 7 . 29, £ < .001. r 

In ^ jsontrast to the sharp differences between the groups on the tasks 
assftfXng linguistic ability, no d'ifference in the performance of good and 
poor spellers was found on the visual memory task, t(28) - 0.30, £ > .05. 
This finding suggests that while the ability to reoeiii>er visual information 
may enhance spelling proficiency in some individuals, it say not by itself ac- 
count for the performance differences observed in this sample of college stu- 
dents. 

Perfqriaanoe on reading and vocabulary tasks . It wa& also of interest to 
dcteririne whether the two groups of university students could be distinguished 
on tests of reading ability. Whereas both good and poor spellers c^onstrated 
college level proficiency in reading English words and in verbal scholastic 
aptitude, good spellers were distinctly superior to poor spellers in both 
these areas. As shown in Table 3 on the reading subtest of the WRAT good 



Table 3 

Summary Scores for Good and Poor Spellers 
on Reading and Vocabulary Measures 



Measure 

1 . WRAT 
Reading 
Grade 

Equivalent 



Good Spellers 

Mean Standard 



Score 



15.3 



Deviation 



1.3 



Poor Spellers 

Mean Standard 

Score Deviation 



13.3 



1.7 



2. Scholastic 
Aptitude Test 
Verbal 
Aptitude 



53^ 



75.8 



i*65 



66.7 



3. WAIS : 

Vocabulary Subtest 

Scaled * 

Score 111.6 1.8 13.5 1.5 



spellers detained a mean grade equivalent score two years above that achieved 
by the poor spellers (15-3 years versus 13.3 years, respectively). Dlffer- 
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•aces between the groups in reading ability were found both ovthe HRAT test 
Of qral reading, t<2a) * 3.49, p < .00^, and cm caa«»reDenslon /f printed Uk% 
ff-^ftf' ****** ^ Twbal aptitude sci^'e on the Scholastic [Aptitude Test, 
" £ Together these results suggest thatVthe linguistic 

-^lities associated with differences in spelling proficiincy aay also 
contribute to differences in broader aspects of skill in writteA language. 
The fact that reading ability, as It was assessed on these two aeasires, was 
less conspicuously retarded than the spelling perfor^nce of the poor spelling 
group Bay stea froi the fact that reading is a recognition Usk and, as such, 
provides a»re opportunities than are available in spelling for arriving at the 
correct answer by using contextual cues. The easier demands made by reading 
■ay therefore mask the difficulties that more readily surface in written lan- 
guage tasks requiring production. 

In contrast, it is notable that no reliable difference between the groups 
was obUlned on the WAIS Vocabulary Subtest, t(28) - 1.92, p> .05. This 
finding suggests that performance on a measure commonly used to assess verbal 
intelligence is not a factor associated with differences in spoiling profici- 
ency. Instead, the findings point to a deficiency on the part of poor spell- 
ers irt ability to apprehend the internal structure of words. - 

As anticipated, the resu^lts revealed that good spellers were consistently 
more sensitive than poor spellers to the structural principles embulled in the 
English-like pseudowords. Not only were good spellers significantly better in 
recognizing acceptable spelling patterns for English morphophoneaes, they were 
also more proficient In appending both prefixes and suffixes to words and in 
uafing morphophonemic information to correctly represent phonetically neutral, 
reduced vowels. The finding that good spellers were able to derive the cor- 
rect spelling for the pseudowords suggests that their earlier success in spel- 
ling the real words on the Experimental Spelling Test was not entirely the re- 
sult of whatever ability they might have to memorize the spelllt«s of specific 
words. Indeed, it would seem more reasonable to suppose that ^ood spellers 
have succeeded in abstracting regularities that are instanced in the orthogra- 
phy and have learned to exploit this knowledge when called upon to i^ell. 
This finding is consistent with the r^lts of a few studies that have ad- 
dressed this question (Fowler, Liberman, & Shankweiler, 1977j Schwartz & 
Doehring, 1977). The fact that poor spellers performed as poorly on the ab- 
stract spelling tasks as they did on the familiar words of the first experi- 
ment suggests that they are either less sensitive than good spellers to the 
uniformities that underlie English orthography or are less apt than good 
spellers to acceflis this knowledge in transcribing words. 

General Discussion 

The misspellings of college students provide ' insight into the nature of 
spelling difficulty and offer a means for Identifying those abilities that un- 
derlie competence in spelling English words. The findings of this investiga- 
tion suggest that sensitivity to linguistic structure is a critical component 
Of spelling proficiency and may account for much of the variation between otfi- 
erwise literate adults who differ in spelling achievement The data presented 
here revealed that college-level students who differed greatly in spelling 
proficiency also differed In their sensitivity to various regularitt^s of word 
structure. Poor spellers were not only less able than good spellers to ajo- 
stract the orthographic regujarlties existing at the surface phonetic level of 
language, but were also less successful in penetrating below the phonetic aur- 
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face of •ords to th« un4«rlylng Borphophonenlc repreaaitatlons that are cap- 
titrad in a WQrd*^ wrlttaa forai« In4M4# it was Uw abiUty to me^is and 
utilise ^orphophonaalc Knowleds«f t>oth In apelllng actual words and in spel- 
ling English-llke pssudowords, that most clearly differentiated good and poor 
spellers. The finding that these perforaianoe differences are found with 
pseudoifords ioplies that the knowledge that contributes to linguistic 
sensitivity is of a generalized sort that can be applied to new words. 

It was apparent in quest ioniipg good pellers that their linguistic 
sensitivity was often not manifested in an explicit form that could Be verbal- 
ized. Although, in 8<Me instances* individuals could describe the principles 
underlying their choice of a particular spelling pattern, in many other in^ 
stances they were unable to explain how their choice were made. This sug- 
gests that linguistic sensitivity involves tacit knowledge as well as a more 
explicit understanding of how written language maps onto its spoken form. By 
exploiting this knowledge good spellers were able to avoid many pitfalls in 
<B>elling that proved to be insurmcHintable to subjects lacking in this 
sensitivity, as ^ for example, the representation of reduced vcwels and the 
affixation of prefixes and suffixes to base morphemes. 

This Investigation suggests that some college students have inadequately 
learned the principles by which writing r^resents the* language, despite the 
lack of apparent deficits in reading. Of course, fit Is t?tet surprising that 
reading would be easier than spelling, since readina is j recognition task 
that provides multiple cues and requires only a pass i^ft-r«;Qgnit Ion of spel- 
ling patterns. 

The possibility exists that some poor spellers may be expo'ienclng diffi- 
culty not because they are insensitive to the yarlous kinds of regularities 
existing at different levels of linguistic structure, but because they fail to 
apply this knowledge in spelling. It would be of interest to determine wheth- 
er poor spellers could appreciably ii^rove their spelling accuracy after 
receiving some inst«^tion about how their linguistic competence might assist 
them in deriving the drthographic representaUi'^n of words. 

It is, of course, unlikely that differential access to linguistic struc- 
ture can accoiBit for all variations in spelling proficiency^ Other investiga- 
tors have found spelling difficulties in some indlviduall to be associated 
wit|i underlying deficits In serial ordering ability (Kinsbourne & Warrington, 
1961; Orton, 19iir; LecAirs, 1966) or with dysfunctioha in aspects of visual 
or auditory perceptii^fi (Critchley, 1970; Boder, 1973). Hovever, -these 
Investigations were conducted either on children with developmental dyslexia 
or on adults with acquired dyslexia following brain damage. Therefore the 
findings of these studies may be of limited relevance to the questions with 
which this study Is concerned. Although some writers have proposed that 
Individual variation In the spelling proficiency of adults is largely the re- 
sult of differences in visual memory (Shaw, 1965; Witherspoon, 1973), no evi- 
dence of differences related to visual memory was f^und among the good and 
poor spellers in this study. 

At all events. It Is clear that competence In spelling involves more than 
i^e nefl(»*izatlon of word's, M[t requires the ability to abstract regularities 
instanced in word structure at several levels of r^resentation. At the most 
basic levels it entails abstracting xUe spelling^ patterns that stand in 
approxlsate carre9p<mdence to the phoilaaes of engllshe At the oorphemic lev- 
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•1, It re qulroa Itarnlng «ngll»h norph^M and th« convent iona for ooAblning 

n»« pftonttKf for«, Th« latter •e>Uiti« •ep^cUlly are orlticaX for 
productive uae of the ortliograpiqr en<l eeea to be Uoki(« tn miv otherwise 
literate Mul^ wtio ere tmable to apeU proficiently. 

The find jlftga of this inveetigatlon »erve-to e^>ha9lxe thet apelling le 
not a eklll that is fully aoqulred as a pert of en eleamtary eOuoetion. Mmy 
young adults Continuing on in higher eduoatlon have persistent spelling prob- 
i^V^ ^^..V^*^ produced evidence that speUing is not an isolated, 
i?!r t ,!^"r^' other aspects ^>f siting akiU, draws upon a va- 

riw of linguistic abilltisj^, which continue to develop with experience, and 
Which «ay he pborly developed even in highly-selected collego students. The 

liSwir^l*** ^•"^^ «««>»tanc;e to the claia (Choasky, 

mo? that so«e{ abilities required for full use of an alphabet are rather late 
intellectual deyelopMents. 
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Footnotes 

*The nature of the mapping between prK>nemea and their graphemlc represen- 
tations is the subject of considerable debate, particularly in the case of the 
so-^ialled "silent" letters. Nhereaa some silent letters (such as the e in 
make, life, and code) function as dlaqrltlc markers for a preceding vowel j 
phoneme and as «ioh nay readily be clasaif^d as part of the vowo#s»pelllrtg, ' 
others serve no obvious function (e.g., the b in lait> or the u In guard). In 
auoh Instances It is not clear with which" phoneme the grapheme'ls to be 
associated. Iffe have followed Hanna et al. (1966) in classifying "silent" con- 
sonant graphemes with consonant phonemes and "silent" vowel graphemes wl';,h 
vowel phonemes. According to this piKioedure the jgn in gnaw la treated as a 
single spelling pattern. Thus, any of the following spalllnga for /n/ would 
be scored as substitution errors (nn, kn, pn, mn) since like gn they are al- 
ternative apelling patterna for /n/. 
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. 1 . Words of ofM «]r liable andlng in « single consonant that follows a single 
▼oi#el doi^la the final oonsonant befbre a suffix beginning with a rowel. 
Bxa^les include: olanniah , strapped , sobbing , and thinned < Hi therapoon. 

2. ' Mords anding in ailent e usually drop the e before a suffix biginniiw 

uith a vowel. However, worda ending in ce^aad mo, and a few other worda, 
do not di-qp the silent e before a suffix'TeginnIng with ceruin vowels. 
Exaaples include t changeable and nptioeable (Vitherapoon, 1973, p. 2Bzh 

3. ,Ww*<|» anding in silent e preceded by one or nore consonants usually 

retain the e before a suffix beginning with a consonant. Exai^>les 
include: ainoerel| . nlnetjr , and definitely (Witharapoon. 1973, p. 283). 

In Aaerican uaage. the final e is usually dropped before the suffix «<ent 
^' preceded by An exai|»le is abridgaant (Witharapoon, 19737 

p. 28a). 

— >, 5. Final jr folloJlng one or aore conaonahta changes to i before the addition 
of letters other than 1. Examples includeaT flier and skies 
(Witherspoon. 1973, p. 28*). " 

Words anding in o add k before an additional syllable beginning wll^h e, 
i, or jf. An exaaple is picniokera (Witherapoon. 1973, p. 28*1). ~ 

7. In combinations with ful the second 1 of the word full is dropped when 
the word is used as a suffix. An exMple is skillful (Wltherspoon, 1973. 
p. 285). 

a 

8. I before e except aHer c, or when aounded as A, as in na Ighbor or weigh . 

Exaaples Include; disbelieve , beige , and unp«^ceived (Withar spoon, 'm3, 
' p. 276). 

9. &ome nouns ending in o preceded by a consonant add es to forn the plural. 
Others, including eoet auslcal tenas that end in o, add s to for» the 
plural. An exaaple Is echoes (Witherspoon, 1973, p. 29M). 

LEVEL ^ 

1. Wo*»ds of «ore than one syllable, ending in a single consonant preceded by 
a single vowel, if accented ofl the last syllable usually double the final 
con«)iwnt before a jufflx beginning with a vowel. Exai«>les include: 
^2^^^^* amted, equipped , and regretUble (Witherspoon, 1973, 
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Wien a prttfix ends with the mm letter with whloh the root to which It 
le to be united befine^ reUin boUi letters in spelling the word. 
exsiVles inoliidet ■leipeil and diktat Ur (ttlMieiNifroon. 1973* p. 277). 

Mien the prefix /ad/ is appended to base words beginning with thS letters 
S» L* M* 1* £• ^ 1* 1 ^8 assiailated and is orthogrtphlcally 
represented by the""lettef beginning the base word. *n exaiq)le is 
aggravate (Webster. 1963, p. 10). 

Mien tKe prefix /con/ is appended to l4be words t>eginnlng with cither 
1 , . or n, the n is assiailated and is orthographically represented by the 
Tetter beginning the base word. Exaq^les include: coaweaorate and 
codMiiserate (Webster, 1963, p. I6i|). 

4 » 

The identity of reduced vowels within words can often be recovered by 
relating the word to ct^nate forms in whifih the same vowel segment is not 
reduced. Exaa^les includi); graawar ^ - graawatioal, continuance 
"* o<^ntinuation , inspiraticHi - inspire , repetition - repeat . 

If the root forms its noun by the immediate addition of ~lon, the correct ' 
ending is likely to be Ible . There are, however, exceptions. Examples 
include: indigestible and Inexhaustible (Lewis, J962, p. 103). 

If the root ends in -ns, the ending Is probably -Ible. An exanple is 
defensible (Lewis, 1952, p. 103). 

If the root to which the suffix is to be added is a full word in Its own 
right, the correct ending is usually able . An example Is regrettable 
(Lewis, 1962, p. 1). ■ 

♦ «• 

If a two-syllable verb ending in -^dr is accented on the first syllable, 
the noun ending Is likely to b^ - -ance . An exanple Is utterance (Lewis, 
1962, p. 13). . ' 

If a verb ends in -ear, the likely ending Is ance. An example Is 
clearance (Lewis, 19S2, p. 13). 
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WORD LIST 








LEVEL 1 




LEVEL 2A 




LEVEL 3 ' 


1. 


yam 


1. 


strapped 


1 . 


chiJjuahua 


2. 


inflate 


2. 


skillful 


2. 


onomatopoeia 


3. 


adverb 


3. 


cancelled 


3. 


Fahrenheit 




vortex 




picnickers 




plagiarism 


5. 


cameo 


5. 


abridgment 


5. 


sarsaparllla 


6. 


harp 


6. 


fUer 


6. 


hemwrhage 


7. 


terminates 


7. 


changeable 


7. 


sergeant 


8. 


trun^ 


8. 


sincerely 


8. 


eunuch 


9. 


vacate 


9. 


echoes" 


9. 


connoisseur 


10. 


update 


10. 


disbelieve 


10. 


mnemonic 


11. 


vibrated 


11. 


sobbing 


11. 


reveille 


12. 


mandated 


12. 


beige 


12. 


desiccate 
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1 1) 




1 ^ 

1 jp 


sk les 


1 "5 


ttvnH Ilia 

sjrpn A 41X9 


■ ^ • 








1 ^ • 




15. 


z«bra 


15. 


clannish 


15. 


sacrili^ ious 


16. 




16. 


noticeable 


16. 


diphtheria 


17. 


Milt 


17. 


nliMty 


17. 


hieroglyphic 


18. 


boxer 


18. 


thinned " 


18. 


thumb 


19. 




19. 


basically 


19. 


gnaH% 


20. 


intertwined 


20. 


definitely 




lengthen 






LEVEL 2B 






21 , 


uncover 


1. 


olsspell 


C i • 




22, 


dinlonat 


2. 


aggravate 


22. 


soldsred 


23. 


retort 


3. 


COBBBePOrS!^ 




talker 


21. 


canister 




defensible 


e 




25. 


cluBterlnff 


5. 


graiunr 


25 


annihilate 


26. 


und 1 ml n i shed 


6. 


clearance 


26 

«.V 9 


'nfirKtorlAndnon 

1 llWIWIvla^l V/ii 


27. 


teroilnology 


7. 


inexhaustible 


27. 


kaleidoscope 


28. 


inask 


8. 


utterance"* 


28. 


nv OP rhfl A 


29. 


inan 1 f es ta t i on 


9. 


continuance 


2Q 




30. 


definitions 


10. 


prevalttit 


10. 


thiffh 


31 . 


frustrated 


11. 


dlssTmllar 




listener 


32. 


expectat^ion 


12. 


' preferring 


12. 

J** e 


dlaufitht€^r 


33. 


ail^ternate 


13. 


inspiration 


J J e 


indeb ted 




stimulation 


in. 


ooiUed 


31. 


cllitib 


35. 


examiner 


15. 


repetition 


35. 


answering 


36. 


preventive 


16. 


Indigestible 


36. 


knock 


37. 


unen^loyment 


17. 


reconmend 


37. 


beautifully 


38. 


punishment 


18. 


regrettable 


38. 


laugh 


39. 


establishing 


19. 


equipped 


39. 


folk 


• 


electronics 


20. 


connnlserate 




tongue 
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EFFECTS OF PHONOLOGICAL AMBIGUITY ON BEGINNING READERS OF SERBO-CROATIAN* 

I 
t 

Uurle B. Feldaan^t C. Lukatela,tt and M. T. ith^yttt 



Abstract . Third- ahd fifth-grade Yugoslavian children were tested' 
on rapid naning of familiar words and unfamiliar pseudow(^ds that 
0 were a) written in either the Rosan alphabet or the Cyrillic alpha- 
bet and b) were either phonologically anbiguous or not. Phonologi- 
cal ambiguity was produced by using letter strings that, when tran- 
scribed in Roman or when transcribed in Cyrillic, contained one-sor 
■ore ambiguous characters. Ambiguous characters are those letters 
shared by the two alphabets that receive different . phonemic 
interpretations in the two alphabets. The controls for phonologi- 
cally ambiguous words were the same wi^ds in their alternative, 
%»non-ambiguous alphabetic transcription. Consistent with previous 
experiments on adults, the phonologically ambiguous form of a word 
or pseudoword was named much more slowly than the phonologically 
unambiguous form. For children who were equally proficient In both 
Roman and Cyrillic, the effect of phonological ambiguity was greater 
as diildren named letter strings faster. If it can be assumed that 
reading fluency correlates with naming latency, then it can be ar- 
gued that the better beginning reader is more phonologically afialy- 
tic. 

The present paper reports an experiment on the rapid naming of printed 
letter strings by Yugoslavian children. In Yugoslavia, children are taught 
two alphabets: a Roman alphabet (the characters oT which would be fairly fa- 
miliar to the reader of English) and a Cyrillic alphabet (the characters of 
which are similar to but not identical with Russian script). Ordinarily, 
Yugoslavian children learn both alphabets by the end of the second grade and 
.are reasonably proficient in both by the fifth grade. (In Belgrade, where 
8K>st of the children in the present experiment were educated, the Cyrillic al-' 
phabet is taught first.) Unlike the English writing system, the two writing 
systems of the Serbo-Crpatlan language maintain strict grapheme-phoneme corre- 
spondences; the phonemic interpretation ^f a letter does not vary with con- 
text and there are no letters madeasflent by contcxi. Nevertheless, confusion 
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TABLE 1 

« 

SERBO-CROATIAN 



ItOMAN 



CYRItUC 



PftlNTCO 



LCTTCfl 
NAME 



UPFER CASC 


LOWER CASE 


UPPER CASE 


LOWER CASE 


IN I.P.A. 


A 


a 


A 


a 


a 


B 


h 


B 


6 


bd 


c 


c 




u 




e 


6 


M 






6 


/ 


f 1 


h 

w V 




D 


d 


A 


A 


Vl€7 


o 










Di 




M 


V 




E 


e 


E 






F 


f 


CD 






G 


y 


r 

9 


r 




H 


h 


X 


A 


X A 


1 

1 


j 


r 1 


LI 

ri 


i 

1 


J 




1 


j 




K 


k 


K 


K 




L 


i 


A 


A 


la 


LJ 


li 




r\9 


- ( i A 




rn 

1 V f 


M 


PW 


f 


N 


n 


H 

IT 


n 




NJ 


nj 


hb 


H> 


nja 


O 


0 


0 


0 


D 


P 


P 


n 


n 


pa 


R 


r 


p 


P 


ra 


S 


s 


c 


c 


sa 




& 


LU 






T 


t 


T 


T 


to 


U 


u 


y 


y 


u 


• V 


V 


B 


B 


va 


z 


• z 


3 


3 


za 


i 


2 


>K 




3» 
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FtldMin et al.i Effecta of Phonological Ambiguity 



•iMiUr to that experienced by the tieglnnlng reader of English la experienced 
^hJ^ beginning reader of 8erb<>-croatlan (Mann, Llberaan» k ShankMSiler, 
ffBO). 

Aa noted above* the Serbo~Croatlan language la written in two different 
alphabets, Romn and Cyrillic. The two alphabets transcribe one language and 
their grapheoes sap sloply and directly onto the same sat of phoneaes. These 
two sets^of gmpheaes are, with cerUin exceptions, autually exclusive (see 
Table 1). Most of the Itosan and Cyrillic letters are unique to their respec- 
tive alphabets. However, the two .alphabets fshare a nunber of letters. The 
phonealo interpretation of- sooe of these shared letters Is the sane whether 
they are read as Cyrillic or as Roaan grapheaes; these are rttftrr^A to as 
do^on letters. The reaaining shared letters, have two phoneaic interpreta- 
tions, one in the Roaan reading and one in the Cyrillic reading; these are 
referred to as aablguOus letters (see Figures 1). Whatever their category, the 
individual letters of the two alphahets have phoneaic InterpreUtions (class!-, 
oally defined) that are virtually Invariant over letter contexts. This re-'i 
fleets the phonologically sballow nature of the Serbo-Croatian orthography. » 



Serbo-Croatian Alphabet 
— Uppercase — 



Cyrilltc 



"Common 
letters" 



Roman 




Uniquely 
Cyrillic letters 



Ambiguous 
letters 



Uniquely 
Roman letters 



1 

, Figure 1. Letters of the Roaan and Cyrillic alphabets. 



The present exporiaent iBxplolts this limited but explicit ambiguity in 
the Serbo-Croatian writing systeo. It does so to address the question of 
whether or not skilled beginning readers who have learned both the Roaan and 
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Cyrillic alphabets can be diatlngulahed fron leas skilled readers by their 
«aiisltivlty to phonologioal aablguityt In rapidly mwing letter strings, is 
the t»etter begliming reader aore haopered by the presence of phonologically 
aibiguous characters than the poorer beginning reader? The question takes 
this latter fora .for two reasons. First, accessing^ the nam of a letter 
string aay entail a phonological ly analytic strategy, Hespecially when the 
orthography is as regular as the Serbo-Croatian orthography (Turvey, Feldman, 
& Lukatela, 198i»). Second, facility with a phonologically analytic strategy 
for naming (and, more generally, for accessing the internal lexicon) may be 
one way to distinguish the more skilled reader fr«i the less skilled reader.* 
Consequently, for these two reasons, it aay be supposed that In Serbo-Croatian 
the aore skilled the b^lnning reader the gr.eater is his or her sensitivity to 
phonological ambiguity. 

A siBilar strategy has been pursued by I. Y. Llberman, Shankweller, and 
their colleagues to distinguish good and poor readers of English by their 
ability to use phonetic coding in the short-tero retention of linguistic 
laaterials presented visually or auditorily. The general result obUlned by 
these Investigators is that good readers perform proportionately worse thar\ 
poor readers when the to-oe-re«embercd stimuli are phonetically similar com-^ 
pared to when they are phonetically dlssiinUar (Mann, Llberman, & Shankweller, 
I960; Shankweller, Llberman, Mark, Fowler, & Fischer, 1979). That is, al- 
though good readers tend to do better in short-term memory tests than poor 
readers, the scores of good readers are influenced more by phonetic similari- 
ty.* 

Outside of the short-term memory task, however, evidence for a difference 
between good and poor readers of English that is based on a difference In 
sensitivity to the linguistic underpinnings of the orthography is both sparse 
and equivocal. For example, Barron (1978) showed that visually presented 
pseudohOBOphones (e.g., BRANE, WERD) lengthened the lexical decision latencies 
of good rieaders but not of poor readers. It is difficult, however, to draw 
conclusions about linguistic contributions to visual word processing on the 
basis of pseudohomophone effects for the following reasons. First, there Is 
the possibility that the phonetic Interpretations assigned to pseudohomophones 
(e.g., BRAME) and to their related words (e.g., BRAIN) may be sensitive to the 
orthographic differences between them. Second, even if a pseudohoaophone and 
its /'elated word were assigned identical phonetla interpretations, it does not 
mean that they would be assigned identical phonWogical interpretations. (In 
formal linguistics, the phonetic and phonological representations of an En- 
glish word are distinct.) Third, it is frequently the case that the 
pseudohomophones used In experiments are visually less similar to English 
words (i.e., orthographlcally less well structured) than are the control 
pseudoWords (Martin, 1982). 

Speaking more generally, reliable demonstrations of a linguistic 
contribution (e.g., phonological) to visual lexlca* access with English 
jiat«*iels have proven hard to come by, regardless of the age and fluency of 
the reader. This. fact has been Interpreted to mean that accessing the lexical 
representation or name of printed English words Is ordinarily a linguistically 
nonanalytiu process, often termed visual (e.g., Coltheart, 1978). Alterna- 
tively, it could be interpreted to mean that, within the confines of» the 
experimental -procedure for studying lexical access and naming. It Is difficult 
to find a manipulation of English stimulus materials that consistently reveals 
a linguistic contribution. 

/ 
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Results of research on lexical access and naaing with the Serbo-Croatian 
language contrast sharply with the results of research with English. It has 
been shown repeatedly tttat in tasks where the lexical status of a letter 
string has to be provided rapid^ly, the presence of aoft)lguous letters has a re- 
tarding effect. A phonological contribution is consistently implicated (Feld- 
oan, 1983). The basic experimental procedure has been to compare two kinds of 
letter strings: (1) phonological ly unambiguous letter strings , comprised of 
letters ur.ique to an alphabet as well as letters shared by the two alphabets 
(see Figure 1). (2) phonologlcally ambiguous letter string s, comprised solely 
of letters shared by the two .alphabets and always includirjg one or more 
ambiguous letters. The first kind of letter string can be read In only one 
way and has a single roorphophonologlcal representation. In contrast, the sec- 
- ond kind of letter string can be read In two ways because It Is written In ike 
letters shared by the two alphabets, some of which are phonemlcally bivalenS; 
a letter string of this kind has two distinct morphophonologlcal represcita- 
tions.' If lexical access and naming proceed with reference to the phonology, 
then a phonologlcally ambiguous letter string might be expected to extend re- 
sponse time relative to a letter string that receives avjanique morphophonolog- 
lcal representation. This hypothesis has been evaluated in two ways: via a 
comparison- of different letter strings (Lukatela, Popadi<5, Ognjenovl6, & Tur- 
vey, 1980; Lukatela, Savi<5, GligoriJevi5, Ognjenovi6, & Turvey, 1978) and via 
a comparison of different versions (Roman and Cyrillic) of th£ same letter 
string (Feldman, 1981; Feldraan, Kostid, Lukatela, & Turvey, 1983; Feldman & 
Turvey, 1983). 

When different words are compared, problems of matching the words on fre- 
quency of occurrence in the language, richness of meaning, length, number of 
syllables, etc. arise. These problems can be virtually eliminated by taking 
advantage of the fact that some Serbo-Croatian words can be transcribed in the 
Roman and Cyrillic alphabets such that in one alphabet the reading is phono- 
loglcally ambiguous, whereas in the other alphabet-, the reading is phonologl- 
cally unique. To evaluate the phonological contribution to lexical access and 
n^ing, the b: - " -»haoetical nature of Serbo-Croatian permits a comparison of a 
^itten word with itself. 

Consider the Serbo-Croatian word for savanna. This word' is phonologlcal- 
ly bivalent when transcribed in Cyrillic (CABAHA, where C, B, and H are 
ambiguous) and phonologlcally unique when transcribed in Roman (SAVANA), The 
expectations that lexical decisions on, and the namiog of, .letter string <ke 
CABAHA should be significantly slower than the SafwT" responses to < r 
strings like SAVANA has been confirmed experimentally (Feldman, 1981; 'einrn 
et ai., 1983; Feldman 4 Turvey, 1983). To reiterate, the letter SL-^-ags 
CABAHA and SAVANA are the same word and, therefore, identical in all respects 
butf one, namely, the number of Inorphophonological representations. . It is, 
therefore, a noteworthy empirical observation that their associated latencies 
should differ by hundreds of milliseconds. 

The design used In the present experiment with children was modeled after 
that used in the experiments with adults by Feldman and her colleaoues (see 
• above). Because mastery of both the Roman and Cyrillic alphabet's is^n essen- 
tial prerequisite for the appreciation of bivalence, children were^tested at 
two levels of alphabetic proficiency: 6 months *and 30 months after they had 
learned the second alphabet. All children were tested on words and pseudo- 
words th^t were* phonologlcally ambiguous when transcribed in one of the two 
alphabets. The children's naming latencies and erroneous responses to these 
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aobiguously transcribed letter jstrlngs and to their unambiguoudly transcribed 
oontPOls t#er« conpared. In the experlnent, the quest louO/iJDS^ above about 
phonological ambiguity and beginning readers of Serb^Croatlan took the form: 
With alphabetic proficiency controlled, is the latency (and/or error) differ- 
ence between naming ambiguous and unambiguous versions of the same word (that 
Is, the effect of phonological ambiguity) larger for the child whose reading 
skills are superior? 

Method 

Subjects 

In order \^ include a range of reading ability at two levels of alphabet- 
ic proflclencyT' third- and fifth-grade students from the Svetozar Miletitf 
School in Zeoun, a suburb of Belgrade, participated in the study. The sample 
consisted of two coo^lete classes at each grade level. As is the practice in 
Yugoslavia, these classes were not griped by reading irblllty. Based on their 
own accounts, 85* of the children had learned the Cyrillic alphabet in the 
first grade and the Roman alphabet in the second. For the remaining 15>, the 
order of acquisition was reversed. When asked to write out their name, 95% of 
third graders and 73* of fifth graders chose to write In Cyrillic. Initially, 
'♦O third graders and 37 fifth graders were tested. Three students were 
eliminated from the study because they often hesitated and triggered the voice 
key before actually initiating articulation. Two students were eliminated due 
to a preponderance of technical errors. lAnother four students were randomly 
eliminated In order to yield an equal number of subjects in each condition. 
Data from 3** students at each grade level were Included In the analysis. 

Materlala 

Two sets of letter strk^gs were presented to each child. These included 
a pretest composed of 20 orthographically regular and unambiguous pseudowords 
all written in Cyrillic. After a brief pause, this was followed by a mixed 
test list. The test Included ^0 words and ^0 pseudowords. Half of the letter 
strings were ambiguous and half ware unambiguous. Among the ambiguous words, 
half were words by their Ronian readingr (and pseudowords by their Cyrillic 
reading), e.g., BATAK, and half were words by their Cyrillic reading (and 
pseudowords by their Roman reading), e.g., EKCKP. Among the ambiguous pseudo- 
words, both alphabet readings were phonologi cally acceptable but n^^ingless* 

stimulus items were constructed so that each word and pseudoword could be 
written in two forms: A phonologically ambiguous form and the unique alphabet 
transcription of >that same word. For the ambiguous words, half of thojjnique 
alphabet transcr if>^ions were in Roman and half in Cyrillic. Analogously for 
the ambiguous pseudowords, najlf of the unique alphabet transcriptions were in 
Rooian and half in Cyrillic. For the ambiguous pseudowords, however, the 
unique 'alphabet transcription was arbitrarily designated because there was no 
preferred phonological interpretation based on lexicallty to vhich it :ieed 
correspond.** In summary, there wer^ four typeo of words (ambiguous/pure x Ro- 
man/Cyrillic) and three types of pseudowords (ambiguous/pure Roman/pure Cyril-- 
lie). Each child viewed words and pseudowords of ench type and different 
forms of the same item were presented to different groups of children (see 
Table 2). Examples of the ^40 words and ''0 pseudowords and their disj^ribut ion 
across groups is sunsnarlzed in table ^ . 
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Lexloallty 



WORD 



Table 2 . 

Exaaplea of AMBIGUOUS and UNIQUE letter strings 
and their distribution across groups of subjects* 



Alphabet 
ROMAN 

CYRILLIC 



Phonology Form Group 1 



Anbiguous 
Unique 

Ambiguous 
Unique 



BATAK drumstick 

EKSER nail 

BETAP wind 

KOBAU hawk 



Form Group 2 

KOBAC hawk 
VETAR wind 

EKCEP nail 
BATAK drumstick 



PSEUDOWORD 



ROMAN 



CYRILLIC 



Ambiguous 
Unique 

Ambiguous 
Unique . 



BOPAM* 

ROJOS 

CEMOH» 
XABOT 



HA BOX 
SEMON 

POJOC 
BOnAM 



•Classification of these letter strings distinguish only In the randomly as- 
signed alphabet of their unique alphabet transcription (see text). 



THIRD GRADE 



FIFTH GRADE 





snaAMsiouous roman 

eii^AMSIOUOUS CYRILLIC 
dJ PURE CONTROL 



AM9fOUOU8 BATAK EKCEP 
BATAK eK8ER 



BATAK EKCEP 
BATAK EKSER 



Figure 2. Mean reaction time for third and fifth graders to name AMBIGUOUS 
(Roman and Cyrillic) words and the UNAMBIGUOUS alphabet transcrip- 
tion of the same words. 

229 



235 



Eff«oU of PhonologioAl Aablgmty 



All l«tUr stf*iiig9 had betwoM thr«e imd flv« letters and the proportion 
of ItMs with three, four, end five letters ms balenoed ecrooe worde and 
p«MQirardfl la the test llet. All letter etrSoie in tite peeudoirerd pretest 
conuined four or flye letters. All words in. the test list »#er«i fa«illar to 
third- as well as flfth^grade students as Judged by their teachers* assessment 
and by a frequency count «>ased on children's texts In Serbo-Croatian (LuKld, 
1970). . ' 



The children perfonned a naolng task on two lists of item^', a pretest and 
a test. They read each, word aloud as it appeared, projected onto a screen 
placed about 1 ■ in front of the child. Reaction tine was Measured froa stla- 
ulue onset by a voice key. One experi»enter recorded latencies and marked er- 
rors while a se^ondVyperlnenter noted the errors In more detail. In the pre- 
test, children wei^lnstructed to read each letter string as^ccurately as 
possible. They were told that *ai items were pseudowcrds composed of four or 
five letters, printed in Cyrillic. After the pretest, instructions were modi- 
fied to stress speed as well as accuracy. The children were also informed 
that the next list would be composed both of meaningful words and of letter 
strings that had no meaning. Further, they were clued that some of these 
would be printed in Cyrillic and some in Roman. Finally, they were Instructed 
at the outset and prompted through the course of the test list to read ambigu- 
ous words by their word readiag when one existed, i.e., to read BATAK as 
/batak/ meaning •♦drumstick," not as /vatak/, which is meaningless. (Only the 
word readings were treated as correct responses.) In summary, the ambiguous 
and unique forms, of each word were distributed across two groups of subjects 
so that no subject saw two forms of the same word but all subjects saw ten 
ambiguous forms and ten unique forms in each alphabet. Pseudowords were de- 
signed in an analogous Banner although there was no real distinction between 
the two alphabets for amWguous pseudowords. Finally, practice items occurred 
at the beginning of both the pretest and the test list. - 



All correct reaction times were Included In the analysis of variance. 
Because there was a high proportion of slow latencies, some as long, as ^lOOO 
ms, median reaction times were entered Into the analysis of variance. Sepa- 
rate analyses were performed on the word and pseudoword data. In order to 
capture any pattern revealed by the extended latencies, a second set of error 
analyse was performed including Incorrect responses «md correct responses 
that were slower than 2500 ras. For both the median and the error data on 
words, analyses based on subject variability and on Item variabilltv^{ in 



The analysis of median reaction times for words revealed significant main 
effects for three variables! Grade, Alphabet, and Phonology. Inspection of 
the means in Figure 2 shows a significant effect of grade— that is third 
graders were slower than fifth graders, F(1,66) - 5.58, MSe » 281883.0, p < 
.05 (F(1,38) - <|il.60, MSe - 15329.8, £ < .001). In addition, there was a sig- 
nlTicant effect of alphabet — Cyrillic words were named faster than Roman 
words, F(1.66) - e.Oit, MSe - 13753, £ < .05 (F(1,38) - 0.7^, MSe - J!i92078, £ < 
.60). Most Important, there was a significant effect of phonology— ambiguous 
words were slower than the unique alphabet transcription of the same word, 
r(\,66) - 38.56, MSe - 16175.9, £ < .001, (F{1,38) - 13.03, MSe « 35081.2, £ < 



Procedure 



Results 



parentheses) are reported. 




230 



23G 



Feldaan et al.: Effectsjof Phonological Aablguity 



.001). In the 3ubjdct3 analysia (but not in the Itetu analysis) 2 two-way 
>BtwiotiofW 9^ro mit»4 aignif ioaooet I The Interaction pf phonology by alpha- 
ouggeited that ovepati, the ei^ecb of asbigtms phonology Bight have been 
■ere robuat in ftc«an than in Cyrillic, f(l ,66) - 3.37, MSe - 20080.2, £ < .06. 
And, aeana for the Interaction of grade by alphabet indicated that third 
gradera were alower on all Ronan. print than on all Cyrillic but that fifth 
gi^ra read aloud co^>arably in both, F(1,66) - 3.8^. MSe - 13753.8, £ < .05. 
The three-way interaction of alphabet "by phonology by grade mlaaed slgnlfi- 
caiwe, however. 

The analysis of variance on Incorrect^aod-cM^riM^itmaea to worda provid- 
ed a pattern almllar to that for reaction time, '^ird'gradera perfonaed leaa 
well than fifth gradera, F(1,66) - 7.90, Mae - 2.0879, £ < .01 (F(1,38) - 
6.<K), M§e - .9993, £ < .05). Aabiguoua word#wer« wore likely to elicit in- 
correct responses than unaablguoas words F(nW) - 23^1.75, MSe - .92^*7, o < « 
.001 (F(1,38) - 52.05, MSe - ^1.9967,. £ < ToOl). There was, however, no tain 
effect of alphabet. In the analysis of errors, the differenoe between third 
and fifth graders was larger for aBd)lguous words than for pure words as the 
Interaction of phonology by grade indicated, F(1,66) - 12,03, MSe - .92^17, £ < 
.001 (F(1,38) - M.22, MSe • 1.1598, £ < .05T. Moreover, as indicated by the 
Interaction of alphabet by grade, third graders had a tendency to perform less 
wall in Bonan than In Cyrillic, while fifth graders showed the opposite pat- 
tern, F(1.66) - 11.117, MSe - 1.2523, £ < .05 (F(1,38) • 5.63, MSe - .9993. £ < 
.05). Finally, the three-way Interaction o( phonology by grade by alphabet 
was nearly significant, £(1,66) - 3.^7, MSe - 1.6133, £ < .06 (F(1,38) - 7.78, 
Mse - 1.159§, £ < .01). The inean nunber Of errors for each* condition and 
grade are reported in Table 3. , 



Table 3 

Mean numberl of errors {and standard deviation) of response latencies 
for amplguous and unique forms of words in each alphabet. 



CYRILLIC UNIQUE 

AMBIGUOUS CONTROL DIFFERENCE 

(EKCEP) (EKSER) 



ROMAN UNIQUE 
AMBIGUOUS CONTROL DIFFERENCE 
(BATAK) (BATAK) - 



THIRD 

GRADE 2.32^. 0.K7 1.85 2.91 0.H1 2.53 

(361)° (293) (H10) (368) 

FIFTH 

GRADE '2.00 O.38 1,62 1.il7 0.32 1 .15 

(213) (151) (201) (183) 

^rors 

Standard Deviation of Latencies 
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♦ 

TtM analysia of aedian pseuiSoirord latencies including three types of 
j^nmmeMorUm im»iw»ouM, miqtie Cjrrillio, untqiie {Unmh) igdioatea a aigairtoaat 

"iMiin «n%ot «r tmmio^t rt2,t3S?> * ar^o** use'- itmo,$, jg < .oor» ma a 

■erginally sicnif leant effect of grade «itiereby tUlrd graders t#ere sloMsf than 
fiftH graders^ F(t,66) » MSe - «6S15/> £ < .06. Mean peeudoword netting 

tiaes in ma for"* t!y.rd graders were 1308 tir Cyrillic^ .1286 for Roaan and 1574 
for aaiilguous letter strings; correspqpding tlAes for fifth graders were 
1204, 1097 » and 1524, respectiveljL PoA lK>o tests indicated that unique Ro- 
■an and uilique Cyrillic fones did t»»-mf f^ for third graders, ^(1,66) - .12, 
but that unique Roaan fonts were signiricantly faster than urTique Cyrillic 
ferae for fifth graders, £(1,66) - 7.1?k/£ < .009. 

Several analyses of variance suggested that alphabetic proficiency as in** 
dexed by interactions of alphab^ by grade f igtred proainentlv in the pattern 
of results. In general, perfor*umce of fifth graders was equivalent with Ro- 
aan and Cyrillic prints whi^ third graders displayed weaker perforaanoe with 
Rcaan than with (^rillio. In the anelfses that follow, profioiency with each 
of the two alphabets wa^ not confounded with aeasurea of reading skill; the 
relation between reading skill and sensitivity to phonological aabiguity 
(which depends on the ability to derive two phonological interpretations for a 
letter string-*one in each alphabet) was addressed separately at two levels of 
bialphabetic proficiency. 

The Relation of Aabiguity to Decoding Speed and Error * 

y 

For each subject, the difference in naaing tiae for ai^iguous and unique 
words was coaputed separately for Roaan words, for Cyrillic words, and for 
their combined effect. These provided indices of the effect of phonological 
airt>iguity. In addition, the aedian latency on the pretest with unaabiguous 
Cyrillic pseudowords was cxiaputed for each child. Given that naaing tiae for 
individual letters and pseudowords has been shown to correlate with reading 
skill (Jackson, 1980; Jackson & McClelland, 1979; Perfetti & Hogaboaa, 
1975), the nediam pretest latency can serve as an index of reading proficien- 
cy. The aabiguity scores were then correlated with the pretest latencies for 
33 third graders and 34 fifth graders. (A reading skill aeasure was aissing 
for one third grader.) Correlations were coaputed Mparately for both grades, 
since grade provided an index of bialphabetic proficiency. Moreover, separate 
correlations insured against a correlation produced by saflq;>ling froa two ex- 
treae groups because third graders were ^nerally slower th4n fifth graders. 
The oerrelation between the degree to which nMilng wis slowed down by aabiguir 
ty and reading skill (as indexed Oy the pseudoword naaing task) was signif- 
icant for Cyrillic w<^ds alone for third graders, r - -.430, £ < .05, and 
nearly significant for fifth graders, r « -.297, £ < .10, The correlation be- 
tween ai^iguity and reading , skill for Roman words alone was nonexistent for 
third graders and nearly significant for fifth graders, r - -.274, £ < .20. 
Finally, the correlation between aiift>iguity averaged over alphabets and reading 
skill was not significant for third ^aders but was significant for fifth 
gratters, r • **.37d, £ < .05.. These results are suaaarized in Table 4. Tlie 
negative correlations indicate that, in general, the faster reader is aore im- 
paired by phonological aat)iguity. Overall, the effect is strongest in the 
Cyrillic alphabet, that is, the alphabet learned first. 

Classification of errors showed that overall, third and fifth graders did 
not distinguish on the types of errors they nade, although third graders tend- 
ed to flSke aore errors generally. Three types of errors were identified s 1) 
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Table 4 

Corrllation of reading skill with detriaent due to aobigulty. 



AN8XGU0US 
TBAJfSCRIPTION 



THIRD GRADE FIFTH GRADE COMBINED 



Cyrillic -.'ISO** -.297** -.398*»* 

Ro«an -.0^ ',27H* -.011 

CoMbined -.28* '-.378» -.286* 

» ' N - 33 N • 3^ N - 67, 



♦£ < .20 »£ < .05 

< .10 •»£ < .02 

»«»£ < .01 



Table 5 

Proportion of wrong alphabet, mixed alphabet and substltutlon/healutlon 
errors for all words by third and fifth graders. 
(Numbers represent percents.) 
ERROR 



WROMG Mito SUBSTITUTION/ 

CWADE ALPHABET ALPHABET HESITATIONS 



THIRD 12.95 * 3*75 3.28 

FIFTH 10.8 3.05 2.18 

ft 
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RAA^ing nil atiblguoud Kord in th« wrong alphabet, for instance, giving 6ATAK a 
Miming less Cyrillic reading whan it aeaaa "drunticK" read as Soaan^ Note 
tlWM^.«rroR» o«iit.oeetr oniy witb wxr^ 2) Nixing alphabets within a 
wprd, ff.g. , reading one airi)iguou8 character in Rooan and the following charac- 
ter in Cyrillic. 3) Hesitating, or reversing, or substituting a different 
frtionMe for the sfid that Is in^eoified. In classifying errorst a given word 
for a given subject was never entered into two categories. Wiere an error -was 
classifiable in more than one way, wrong-alphabet and mixed-alphabet designa- 
tions took priority over substitutions and hesitations, but unique word errors 
were necesivsrlly of the latter variety. The error data reported below are 
restricted to words, both ambiguou^and unique, and are summarized in Table 5. 

Pure words were excluded from subsequent analyses. Separate analyses of 
variance were performed for wrong alphabet and mixed alphabet errors on 
adC)*guous words. Inspection of mi^ed alphabet means in Table 6 and the re- 
sults of the analy»is indicate no significant main effects or interactions. 
In the wrong alphabet error aitalysis, th^re was a significant interaction of 
alphabet by grade, F(1,66) - H\9'if MSe - 1.520, £ < .05. (F(1,38) - 3.09, 
MSe •> 1.620, £ < .05). Consistent, with the latency data on alphabetic profi- 
ciency described above, third graders found the Roman ambiguous words more 
difficult than the Cyrillic ambiguous words, while fifth graders found there 
equivalent. / 



Table 6 / 

Mean nunber of wrong-alphabet and mixed-alphabet errors 
(and standard deviation) for ambiguous Roman and Cyrillic words 



{ 



THIRD GRADE 



FIFTH GRADE 



AMBIGUOUS 
TRANSCRIPTION 



WRONG 
ALPHABET 



MIXED 
ALPHABET 



WRONG 
ALPHABET 



MIXED 
ALPHABET 



CYRILLIC 
ROMAN 



.91 

U63 



(1.23) 
(1.36) 



(.61) 
(.60) 



1 .22 
.97 



(.91) 
(1.09) 



.19 (.29) 



In order to ascertain whether type of error varied With reading skill, 
each type of er^or was correlated with scores on the protest. From the pre- 
test, two indices of reading skill were developed: Med/an^rfetest naming time 
and the number of pretest errors— si* stitutlon, hesitation, or reversals. As 
above, correlations were cooputed separately for gra^s. In this case, number 
of mixed-ialphabei errors correlated significantly w/th reading skill. Results 
are summarllzed ' in Table 7. The ♦positive correia^ons indicate that for read- 
ers who 'are equally proficient in both alphabets, less skilled cfecoders were 
more likely to mix alphabets within a word (Roman or Cyrillic) than were more 
skilled decoders. V r / . 
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Table 7A 

Median of Decoding Latencies (standard deviations) and their 
correlation with Mixed-Alphatet Errors. 



THIRD GRADE 

MEASURE 
DECODING 

UTEKCY (S.D.) 969(270) 



FIFTH GRADE 



878(15^) 



THIRD/FIFTH 



924(223) 



CORRELATION 



.182 



.365* 



.250* 



Table 7B 

Median Decoding Errors (standard deviation) and their correlation with 

Mixed-Alphabet Errors. 



DECODING 

ERRORS (S.D.) 2.65(1.99) 



2.16(2.19) 



2.K0(2.09' 



CORRELATION 



< .05 
••»£ < .01 



.il66»»» 

N - 33 



.355» 
N - 3^ 



N - 67 
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~ Dlacusaion 

If a reader nMed ww^Js smctly on the basis of their faalllar flgural 
a^>eots» then nailing a faMlllar printed word tranacribed with one or nore 
aibiguoua oharaotera (e.g., SATAK) should not be any different fron naaing the 
saaa fa«lllar word transcribed with no agbiguous characters (e.g.» BATAK). It 
- la evident, however, that phonological ly ambiguous letter strings were naned, 
in general, aore slowly than were their phonologically unaablgi^ous controls. 
Evidently^ the readers in the present. experloent did not treat words as holis- 
. tlo figural patterns. On the contrary, the data suggest that the beginning 
readers in the experiMnt noticed (more or less) the phonological aspects of a 
printed word that were specified in the details of its orthographic strvxjture. 
In this latter respect the pres«it data replicate with children the observa- 
tions made previously with adults. 

When bialphabetic adult readers of Serbo-Croatian performed a lexical 
decision task, letter strings composed of ambiguous and common characters in- 
curred longer latencies than the unique alphabet transcription of the same 
word (Feldman & Turvey, 1983) and, in an analogous naming task, the same pat- 
tern of results occurred (Feldman, 198I). In the adult experiments, words 
were selected so as to include a varied distribution in the number and posi- 
tion of the ambiguous characters within the letter string. Results indicated 
that all letter strings^PHt could be assigned both a Roman and a Cyrillic 
reading incurred longer latencies than the unijme alphabet transcription of 
the same word and that the magnitude of the dimrence between the ambiguous 
form of a word and its unique alphabet control depended on the number and 
distribution of ambiguous characters in the ambiguous letter string, 'mese 
reeults with phonologically bivalent letter strings were interpreted as evi- 
dence that both lexical decision and naming in Serbo-Croatian necessarily in- 
volve an analysis that is sensitive to phonology and component orthographic 
structure. Moreover, in an earlier study, words and pseudowords composed 
entirely of common letters (with no ambiguous or unique letters) were accepted 
and rejected, respectively, no more slowly than letter strings that included 
common and unique letters. Because the' distinction between common letters and 
ambiguous letters was based on their phonemic Interpretation, this result sug- 
gested that it was phonologieal by[alence rather thaa a figure-based alphabet- 
ic blvalence that governed the fftect (see Lukatela et al., 1978, 1980, for a 
complete discussion). In summary,* the adult studies suggested that processes 
of word recognition were both analytic and phonological in nature. 

The major question of interest in the present study was whether the mag- 
nitude of the difference between the ambiguous forms and their unique alphabet 
controls was larger for the more fluent beginning readers: Were the better 
beginning readers caused to respond proportionately more slowly (and/or to 
commit proportionately more errors) by phonological anfcigulty than the poorer 
beginning readers? In the present experiment, the measure of reading skill 
was tli^e spmsd with which a reader named unfamiliar, nonsense letter strings. 
This speed' should be Taster, on the average, the better are the reader's 
decoding skills. However, the answer to the ambiguity question was not inde- 
pendent of another more general question, namely, the subjects' relative pro- 
J'iclency in Roman and Cyrillic. S«*nsltlvity to phonological ambiguity 
/neeessariiy entails the ability to (automatically) assign ' phonological 
interpretations in two alphabets (see Turvey et al., 198^)— an ability that 
requires approximately equal familiarity with both alphabets. 
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TIM malysls of varlancd on nedi^n reaction tiaes ImSicated that ttiird 
graders naa»d letter etrings anra alowly and teaa |^ow*ately than fifth 
grartara^ Tlie analysis also revealed that the parfDraanoa of all ohll- 
dr^^third and flftl) gmdar9o«*dat«ri orated on aabiguous latter strings rela- 
tive to their unique alphabet controls. The aajor question of interest, how- 
ever, was whether the sore skilled beginning reader was eore iapaired by 
(phonological aabiguity than the less skilled beginning reader, independent of 
general profioienoy with each alphab<^t. Although the interactions of phonolo- 
gy by grade or of phonology by grade by alphabet were not significant, the re- 
lation between phonological aabiguity and reading skill was evaluated 
separately for each level of alphabetic proficiency for several reasons. 
First, ini^^tion of the seans in Figure 2 indicates that for fifth graders 
the effect of phonological aBrt>iguity was constant over- alphabet, whereas for 
third graders the effect was nore exaggerated for those letter strings that 
were aabiguous in the Ronan alphabet. (A coiHparison of the variances mctqm 
groups of words in Table 3 suggests the sam tiding. ) Second, although neither 
of the two^'way interactions was significant by the itens analysis, one (alpha- 
bet by phcmology) was almost significant (£ < .06), and the other (alphabet by 
grade) was significant (£ <..05) by the subjects analysis. Generally,, the 
chances of dotaining higher order interactions with the dlchotooous grade 
variable were further reduced by the sagnicude of the variability In the la- 
tency data of the third-grade children. Third, the error data of the 
third-grade children suggested that their facility with Roman letter strings 
in general was not as good as their facility with, Cyrillic letter strings in 
general, whereas no such bias was evident In the fifth-grade data. 
♦ \ 

Apparently, thitxi graders were less proficient in the newly acquired Ro- 
man alphabet and they found it difficult to suppress the unwanted Cyrillic 
reading of an aSbiguous Roman letter string. For third graders the 
first-learned and more familiar (Cyrillic) alphabet tended to dominate their 
naming responses. Because a Cyrillic bias would exc^gerate the effect of 
airibiguous characters In Roman words and reduce the effect of ambiguous charac- 
ters in Cyrillic words, it counters any"* true phonological aafclgulty effect. 
It appears that the dominance of the first-learned alphabet has waned consid- 
erably by the fifth grade, however. The analysis on wrong alphabet errora 
(Table 6) supports this interpretation. It should be remaHced, however, that 
there Is some evidence to suggest an asymmetry between the first- and sec- 
ond-learned alphabets— with a continued dominance of the first-learned — that 
persists, in more difficult tasks, through adulthood (UJkatela, Savl^, 
OgnJenOvi(5, & Turvey, 1978). The literature on interference patterns between 
languages using the Stroop and dlchotic listening tasks (Magiste, 198*) pro- 
vides evidence for a similar asymmetry. The detrimental' Influence of the sec- 
ond-learned language on the first-learned language and the influenee of the 
first-learned language on the second-learned language J^s/smetric when profi- 
ciency in both languages is balanced, but asymmetric when one language Is dom- 
inant. As proposed elsewhere. In terms of the interaction oir two syi;^l sys- 
tems bialphabetism may be a limited. case of blllngualism (Feldman, 1983; 
Lukatela, Savid, Ognjenovi^, & Turvey, 197.8). 

As discussed above, speed of naming nonsense letter strings was taken as 
the measure of a child*& reading skill. Speed of naming nonsense items was 
then correlated with the aadL>iguous-un3id}lguous latency difference. Larger 
differences were associated with faster naming times (Table 1). That is, fas- 
ter decoders were slowed prbportlonately more by nhpnologlcal adbiguity than 
slower decoders. Examination of the sub-correlations revealed that this alg- 
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p;- nlficant oorrelatlon was carried in largest part by the difference between 

^ tt*^ Phonologioally ait>lguous in Cyrillic and their unique alphabet * 

;0eiitrelA» ftm tUffwmMm bettieen wQr4« traosorlM a«l»Sguottdly In Rosaiuand 
their unambiguous contrdls did not correlate significantly with the deoSlng 
speeds of third graders (see Table 1). However, as noted above, there was 
eofisld«rU>le variance in Individual ambiguity scores for Rcnan materials, sug- 

h- gestlng an inconsistency in the ability of acne third graders to handle letter 

strings in the newly acquired alphabet. This suggestion is buttressed by the 
lack of a correlation between third-grade anbiguity scores in Rooan and in 
Cyrillic. Apparently, for third graders the basis for the aabiguity effect in 

^ the two alphabets was not the same. 

There are two possible reasons for the slower respcawes to aitibiguous let- 
T . strings, in particular to the aoAMguous Roman letter sti;lhgs. One possi- 

? We reason, anticipated when selecting children at two grade levels, is an 

^ overall Cyrillic bias when interpreting letter strings that is due to unequal 

I. proficiency with the two alphabets. This bias was restricted to ambiguous 

r . letter strings, however. Those letter strings that included unique letters 

^ were no slower in Roman than in Cyrillic at either level of alfhabet profici- 

ency. Moreover, the latency scores for the third graders who learned the Ro- 
^ "an alphabet first revealed the same pattern. Itevertheless, a Cyrillic bias 

is suggested by the latency data on aii*>iguous words for third graders and Is 
further supported by the alphabet-by-grade interaction in the analysis of 
^ ^ variance on wrong alphabet errors. The other possible reason is an effect of 

two piiono logical analyses (where permitted) when proficiency with the two 
alphabets Is equated. This possibility assumes equivalent performance with 
Roman and Cyrillic letter strings. The suggestion, therefore. Is that a large 
«*t>l8Ulty effect on Roman, letter strings occurred for some third graders not 
because they were proficient decoders but, rather, because they were unfamil- 
iar with the Roman alphabet.. Put differently, the ambiguity efXect with Roman 
materials that *«s manifested by third graders could have originated frcni one 
of two factors, where the two factors relate in opposite ways to reading 
skill. By contrast, fifth graders performed equivalently with ambiguous forms 
and unique alphabet controls in both the Roman and Cyrillic alphabets. As 
»uch, the fifth-grade data Indicate a relation between reading skills and 
sensitivity to phi^iologlcal ambiguity when the aasuiqptlon of proficiency In 
the two alphabets is met. 

■ Finally, one other difference between the skilled and less skilled begin- 

ning reader should be noted. The less Skilled beginning reader of Serbo-Croa- 
; tlan is constrained by the fact that the characters of the Roman and Cyrillic 

T , aiphabets belong to independent symbol systems. Table 7 reports the correla- 

tion of decoding speed with the tendency to- mix alphabets in interpreting an ^ 
awhiguous wOV'd. The less skilled decoder who has mastered both alphabets 
equally is more apt to ignore the Independence of the two alphabets and to 
^ construct part of a word's name on the basis of a Roman alphabet reading and 

1 V part on the basis of a Cyrillic alphabet reading. 

*">.-7 ■ ■ • 

C.- . In suomary, the sequential acquisition of two alphabets- In the process of 

f learning to read. In conjunction with some special properties of the Ser- 

% Ix^Croatlan langu^e, permitted an investigation of the facility of beginning 

^ . readers w^th a special variety of phonological analysis. It was demonstrated 

with children in the third and fifth grades that naming is slower and less 
accurate when a letter string can be assigned two phonological Interpretations 
than when ^t can be assigned only one. This effect of aabiguity was assessed 
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on two foru of the saae word and It repliqates earlier reaulta with adults 
(FaW«an, TOI; Feldmn et aX., 1983^ Feldaan i Turvey, 1983). 

Two levels of bialphabetlc profiency were 'exanined. The asyometry of the 
•ffeot of phonolqgical bivalenoe for aobiguous words in Rooum as co«pared with 
CyrUlio was reduced as proficiency in each alphabet becaae equal and this 
suggested an analogy with the interaction of the two linguistic codes of the 
bilingual. For third graders, evidence of a Cyrillic bias when analyzing 
ai«>iguous letter strings Mde assessment of the relation between reading skill 
and sensitivity to pi^onological aabiguity equivocal. For f4Cth graders who 
are almost equally proficient in the two alphabets, however ,'Tthere was evi- 
dence that the more skilled beginning reader was laore iiq>alred by phonological 
art>iguity than the less skilled beginning reader; In ^conclusion, the begin- 
ning reader who names letter strings more rapidly is more analytic in his or 
her style of reading than the poorer beginning reader. 
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*In one failure to replicate 
Tinzmann, & Bowyer, 1983), a crlteri 




to language or it may be more gener~ 
linguistic perception (see Wblford, & 



hese results (Hall, Wilson, Humphreys, 
.for -selecting good and poor readers was 
the math achievement test score froA the Wood cock- John son .battery (Woodcock, 
1973). This test include^ a subte/t where* word .^oroblems are presented orally 
so that sJfcccssful performance on 'that test mu^ involve short-term menwry 
abilities. By constraining selection procedure3 In this way, all children 
with short-term memory problems were effectively eliminated from the Hall et 
al. study. * 



'For example, EKCEF can be interpreted either as /ekser/, which means 
"nail," or as /ektsep/, which is meaningless. . The first form is based on a 
Cyrillic reading of EKCEP and the second on a Roman reading. By contrast, 
EKSER can only be interpreted in Roman, i.e., /ekser/. Therefore, EKCEP is* 
phonologically ambiguous and EKSER is, phono logically unique. The phonological 
representation associated with lexical access is sotoetimes termed morphophono" 
logical. 

*For example, the possible uniqu? alphabet transitions of BOPAM were 
BOIAM (in Cyrillic) and VORAM (in Roman) where neither option was lexical. 
Therefore, alphabet designation for the unique alphabet transition of ambigu- 
ous pseudowords was randomly assigned and balanced over items. 
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VERTICALITY UNPARALLELED* 

Ignatius C. Kattinglyt and Alvln M. Libef?inantt 



Having long foun^ reason to believe that speech is special, we have, 
naturally enough, beerf surprised at the flrnmess with which others have 
asserted, to the cootrary, that speech is Just like everything else or, what 
cofws to the sawe thing, that everything else is spe<»fal, too.* Apparently, 
our claim has run counter to some deeply-heid conviction about the nature of 
mind. One o; Fodor's acMeveraents is that he nakes this conviction explicit. 
On the orthodox view, as Fodor sees it, mental festivities are "horizontally" 
organized; arguments for the specialness of speech and language fit better 
with the assumption that they <ire vertical. Of the many observations provoked 
by Fodor's lucid analysis of these opposing views, we caa Here" of Fer only two. 
■^he^ first has to do with the relations aiDong vertically organized input sys- 
tems; the second, with the relations between input systems and output sys- 
tems, 

Fodor's input systems, being "domain specific" (p. H7), are in parallel, 
and their outputs complement each other. Thus, when two modules are sensitive 
to the same aspects of a signal, representations from both modules should be 
cognitive^ly registered. This assumption is surely plausible for modules, such 
as those for shape and color', that compute complementary representations of 
the same distal object. But the situation is different for speech. There, 
tne linguistic module appears to take precedence over the module (or modules) 
that look aftef distal objects that are not linguistic. Given the same aspect 
of the signal, the linguistic^ and the non-linguistic mqdule are able to 
coRpiJte representations of different distal objects, but if a linguistic 
representation is compute,d^ the non-linguistic representation is not 
cognitively registered. Cohtsider an example to which Fodor himself alludes 
(p. i<9): the translation of \,^.he- third formant during the release of a 
consonantal constriction In aj 9onsonant-vowel syllable. When artificially 
isolated from the rest of the signal, this transition is perceived 
non-linguistically, as a chirp or glissando (Mann & Liberman, 1983; Repp, 
Milburn, & Ashkenas, 1983). But in its normal acoustic context, the same 
transition is not so heard; it simply contributes to the perception of a dis- 
tal object that is distinctly linguistic: the place of articulation of the 
consonant. 
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Fodon'3 account of these facts would be that the isolated transition is 
Ignored by the linguistic module, but not by the non-linguistic module, which 
registers it cognitively as a chirp* His account would also exclude the 
possibility that, for the transition in context, the linguistic module would 
register a chirp as well as a consonant. For the linguistic module, such a 
representation would be at roost "inter©edic4e" (p- 55 ff-), and hence 
inaccessible to central cognitive processes- (We ourselves doubt that the 
linguistic module computes any such representation at all , preferring to be- 
lievet instead, that the earliest representation is an articulatory one.) But 
the simple parallel arrangement of the modules that Fodor assumes does cause' 
trouble, for while it means that "the computational systems that come into 
play in the perceptual analysis of speech. operate only upon acoustic signals 
that are taken to be utterances" (p. ^9), it does not preclude the possibility 
that other systems will operate on these same signals* It suggests that the 
transition in context will be registered not only phonetically, by the 
linguistic module, but also non-pJionet ically , by the non-linguistic module. 
The listener would, therefore, hear both consonant and chirp. More generally, 
and more distressingly, the listener would hear all speech signals both as 
speech and as non-speech- 

What seems called for is 3 mechanism that would guarantee the precedence 
of speechf but would 'not constitute a serious weakening of the modularity hy- 
pothesis. This precedence mechanism would insure that,*,|^ though bot^^the 
linguistic and the non--linguistic modules may be active (since speech and 
non^speech may occur simultaneously in the world), a signal will be heard as 
speech if possible, and otherwise as non-^speech, but not as both* It is rath^ 
er compelling evidence for the existence of such a mechanispi that it can be 
defeated under experimental conditions that evade ecological constraints. 
This is wha^ occurs in the phenomenon known as "duplex perception" (Liberman, 
Isenberg, & Rakerd, 1981; Mann & Liberman, 19b3; Rand. 1974). As we have 
noted, if a third-formant transition that unambiguously fixes the perception 
of a consonant^vowel syllable (for example, either as /da/ or as /ga/) is 
extracted and presented in Isolation, it sounds like a non--speech chirp. The 
remainder of the acoustic pattern, presented in isolation, is perceived as a 
consonant-vowel syllable, but in the absence of the transition, the place of 
the consonant is ambiguous. When the transition and the remainder are 
presented dichotically , a duplex percept results: the chirp is heard at the 
ear to which the transition is presented and an unambiguous consonant (/da/ or 
/ga/, depending on the transition) at the other ear: the ambiguous remainder 
is not also heard (Repp et al., 1983). Thus, the transition is perceived, 
simultaneously, as a non-speech chirp and as critical support for the conso- 
nant. Apparently, the precedence mechanism recognizes that the tran:3ltion and 
the remainder belong together, but is also aware that there are two signal 
sources, one at each ear, and that only one of them is speech. It therefore 
allows both the linguistic module and the non-'linguist ic moduli' to register 
central representations that depend on the formant trarfsition. 

How might this precedence mechanism work? An obvious possibility is that 
it scans tH^*n??oustic input and Sorts speech signals from non-speech signals, 
routing each to its appropriate mcHlule. But such a ^sorting mechanism would 
seriously compromise the irodularitj^ view, because, having to cut across 
linguistic and non-^l ingulstic domains, it would be blatantly horizontril , 
Fortunately, for the vertical view, the horizontal compromise ^pp^ars to be 
wrong on empirical grounds. 
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The point is that a sorting laechanisin would require that there be surface 
properties of speech that it could exploit. Thes , properties would be charac- 
teristic of speech signals in general, but not of non-speech signals. Moreo- 
ver, they would be distinct from those deeper properties that the linguistic 
module uses to determine phonetic structure. It is of considerable interest, 
then, that while natural speech signals do have certain surface properties 
(waveform periodicity, characteristic spectral structure, syllabic rhythm) 
that such a mechanism might be supposed to exploit (and that man-made devices 
for speech detection do exploit) none of these properties is essential for a 
signal to be perceived as speech. Natural speech remains speech-like, and 
even more or less intelligible, under many forms of distortion that destroy 
these properties (high- and low-pass filtering, infinite peak clipping, 
rate-adjustment). And, more tellingly, quite bizarre methods of synthe- 
sis — for example, replacing the formants of a natural utterances by sine waves 
with the same trajectories (Remez, Rubin, Pisonl, i Carrell, 1 981 )~suff ice to 
produce speech-like signals. Thus, speech appears to be speech, not because 
of any surface properties that mark it as such, but entirely by virtue of 
properties that are deeply linguistic. A signal is speech if, and only if, 
the language module can in some degree interpret the signal as the result of 
phonetically significant vocal-tract gestures. (In the same way, there are no 
surface properties that distinguish grammatical sentences from ungramraatical 
ones;, a sentence is grammatical if, and only if, a grammatical derivation can 
be given for it.) We therefore reject this horizontal compromise, and consid- 
er two other possible precedence mechanisms, both thoroughly vertical. 

The first is an inhibitory precedence mechanism that works across the 
outputs of the modules in this way: If the linguistic module fails to find 
phonetic structure, then the output of the non-linguistic module is fully reg- 
istered; if, or. thf other hand, the linguistic module does find phonetic 
structure, the link tu the non-linguistic module causes the "corresponding" 
parts of its output to be inhibited, but leaves the phonetically irrelevant 
parts unaffected. Such a mechanism is certainly conceivable, and, being a 
central mechanism, would not compromise modularity. It would, however, be 
most unparsimonious. For if the inhibitory mechanism were to know which as- 
pect of the output of the non-linguistic module corresponded to aspects of the 
signal that were treated as speech by the linguistic module, it would have to 
know everything that the two modules know: the relationships between phonetic 
structure and speech signals, as well as the relationship between non-linguis- 
tic objects and non-speech signals. Thus a central mechanism would, in ef- 
fect, duplicate mechanisms of two of the modules. 

Turning, therefore, to the second possible precedence mechanism, we pro- 
pose that, 1*11116 the Outputs that modules provide to central processes are in 
parallel, their inputs may be in series. That is, one module may filter or 
otherwise transform the input signal to another module. We suppose that the 
linguistic module not only tracks the changing configuration of the vocal 
tract, recovering phonetic structure, but also filters out whatever in the 
signal is due to this configuration, including, of course, formant transi- 
tions. What remains — non-linguistic aspects of speech such as voice quality, 
loudness, and .pitch, as well as unrelated acoustic signals— is passed on to 
the non-linguistic module. This supposition is parsimonious, in that it in no 
way complicates the computations we^ must attribute to the linguistic module; 
the information needed to perform th# filtering is the same information that 
is needed to specify the phonetic structure of utterances (and ultimately the 
rest of their linguistic structure) to central processes. 
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A further point in favor of this serial precedence mec^ianlsni is that 
aoaething slwilar ^ears to be 'required to explain the operation of other 
c^tIous eamfldated for aodula-hoodi such as auditory localization* echo 
supcression and binocular vision. Consider just the first of these. The au- 
dittry localization nodule cannot simply be in parallel with other modules 
that (Rebate on acoustic signals. ' Not only do 'we perceive sound sources 
(whether speech or non-speech) as localized (with the help of the auditory lo- 
calization module), but we also fail to perceive unsynchronized left- and 
right-ear images (with other modulesTI Obviously, the auditory localization 
module does not merely provide information about sound-source locations to 
central cognitive processes; it also provides subsequent modules in the se- 
ries, including the linguistic module, with a set of signals arrayed according 
to the location of their sources in the auditory field. The information need- 
ed to create this array (the difference in time-of-arrival of the various sig- 
nals at the two ears) is identical to the information needed for localization. 

Unfortunately, hypothesizing a serial precedence mechanism does not lead 
us directly to a full understanding of duplex perception. Until we have car- 
ried out some more' experiments, we can only suggest that this phenomenon may 
have something to dc with the fact that the linguistic module must not only 
separate speech from non-speech, but must also separate the speech of one 
speaker from that of another. For the latter purpose, it cannot rely merely 
on the differences in location of sound sources in the auditory field, since 
two speakers may occupy the same location, but roust necessarily exploit the 
phonetic coherence within the signal frc«n each speaker and the lack of such 
coherence between signals from different speakers. It might, in fact, analyze 
the phonetic information in its input array into one or more coherent patterns 
without relying on location at all, for under normal ecological conditions, 
there is no likelihood of coherence across locations. Thus, when a signal 
that is not in Itself speech (the transition) nevertheless coheres phonetical- 
ly with speech signals from a different location (the remainder of the conso- 
nant-vowel syllable), the module is somehow beguiled into using the same 
information twice, and duplex perception results. 

Our second general observation about Fodor's essay Is prompted by the 
fact that language is both an Input system and an output system. Fodor 
devotes most of his attention to input systems and makes only passing mention 
(p. 12) of such output systems as those that may be supposed to regulate 
loccmotion and manual gestures. He thus has no occasion to reflect on the 
fact that language is both perceptual and motoric. Of course, other modular 
systems are also In some sense both perceptual and motoric, and superficially 
comparable, therefore, to language; simple reflexes, -for example, or the -sys- 
tem that automatically adjusts the posture of a diving gannet In accordance 
with optical Information specifying the distance from the surface of the water 
(Lee & Reiddlsh, 1981). But such systems must obviously have separate compo- 
nents for detecting stimuli and Initiating responses. It would make no jreat 
difference, indeed, if m chose, to regard a reflex as an input system 
hard-wired to an output system, rather than as a single "input-output" system. 
What makes language (and perhaps some other animal communication systems also) 
of. special interest is that, while the system has botrt input and output func- 
. tions, we would not wish to suppose that there were two language modules, or 
even that there were separate Input and output components within a single mod- 
ule. Assuming nature to have b?en a good communications engineer, we must 
rather suppose that there is but one module, within which corresponding input 
and output operations (parsing and sentence-planning? speech perception and 
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speech production) rely on the same graiaaar. are computationally similar, and 
*re executed by the eaae coapoftents. Computing logical form, given articula- 
tory w>v««eftts, and co^Hitlng artlculatory movements, given logical form, must 
dooehow be the saoe process. 

If this la the case, it places a strong constraint on our hypotheses 
about the nature of these internal operations. By no means every plausible 
account of language input is equally plausible, or even coherent, as an ac- 
count of language output. The right kind of model would resemble an electri- 
cal circuit, for which the same system equation holds no matter where in the 
circuit we choose to measure "input** and "output*^ currents. 

If the same module can serve both as part of an input system and part of 
an output system, the difference being merely a matter of transducers, then 
the distinction between perceptual faculties and motor faculties (the one 
fence Fodor hasn't knocked down) is perhaps no more fundamental than other 
horizontal" distinctions. The fact that a particular module is perceptual, 
or motoric, or both, is purely "syncategoramatic" (p. 15). If so, then the 
mind is more vertical than even Fodor thinks it is. 
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